In many scenario the data is available after login that you want to scrape. So to reach at the page where data is located you need to implement code in web scraper that automatically takes usename/email and password to login into website, once login is done you can do crawling and parsing as required.
Many third party web scraping application provides functionality where you can locate login url and set login parameters and that login task will be called when scraper start and do web scraping.
Below is C# example of programmatically login to demo login page
http://demo.webdata-scraping.com/login.php
Below is HTML code of Login form:
<form class="form-signin" id="login" method="post" role="form"> <h3 class="form-signin-heading">Please sign in</h3> <a href="#" id="flipToRecover" class="flipLink"> <div id="triangle-topright"></div> </a> <input type="email" class="form-control" name="loginEmail" id="loginEmail" placeholder="Email address" required autofocus> <input type="password" class="form-control" name="loginPass" id="loginPass" placeholder="Password" required> <button class="btn btn-lg btn-primary btn-block" name="login_submit" id="login_submit" type="submit">Sign in</button> </form>
<form class="form-signin" id="login" method="post" role="form">
<h3 class="form-signin-heading">Please sign in</h3>
<a href="#" id="flipToRecover" class="flipLink">
<div id="triangle-topright"></div>
</a>
<input type="email" class="form-control" name="loginEmail" id="loginEmail" placeholder="Email address" required autofocus>
<input type="password" class="form-control" name="loginPass" id="loginPass" placeholder="Password" required>
<button class="btn btn-lg btn-primary btn-block" name="login_submit" id="login_submit" type="submit">Sign in</button>
</form>
In this code you can notice there is ID for email input box that is id=”loginEmail” and password input box that is id=”loginPass”
so by taking this ID we will use below two method of webBrowser control and fill the value of each input box using following code
webBrowser1.Document.GetElementById("loginEmail").InnerText =textBox1.Text.ToString(); webBrowser1.Document.GetElementById("loginPass").InnerText = textBox2.Text.ToString();
webBrowser1.Document.GetElementById("loginEmail").InnerText =textBox1.Text.ToString();
webBrowser1.Document.GetElementById("loginPass").InnerText = textBox2.Text.ToString();
After the value filled to Email and Password input box we will just call click event of submit button which is named as Sign In
webBrowser1.Document.GetElementById("login_submit").InvokeMember("click");
webBrowser1.Document.GetElementById("login_submit").InvokeMember("click");
So this is very basic example how you can login to website programatically when you need to access data that is available after login to website. This is very simple way in which you can work with Web Browser control but there are some other way as well using which you can do same thing.
Source: http://webdata-scraping.com/login-website-programmatically-using-c-web-scraping/
Many third party web scraping application provides functionality where you can locate login url and set login parameters and that login task will be called when scraper start and do web scraping.
Below is C# example of programmatically login to demo login page
http://demo.webdata-scraping.com/login.php
Below is HTML code of Login form:
<form class="form-signin" id="login" method="post" role="form"> <h3 class="form-signin-heading">Please sign in</h3> <a href="#" id="flipToRecover" class="flipLink"> <div id="triangle-topright"></div> </a> <input type="email" class="form-control" name="loginEmail" id="loginEmail" placeholder="Email address" required autofocus> <input type="password" class="form-control" name="loginPass" id="loginPass" placeholder="Password" required> <button class="btn btn-lg btn-primary btn-block" name="login_submit" id="login_submit" type="submit">Sign in</button> </form>
<form class="form-signin" id="login" method="post" role="form">
<h3 class="form-signin-heading">Please sign in</h3>
<a href="#" id="flipToRecover" class="flipLink">
<div id="triangle-topright"></div>
</a>
<input type="email" class="form-control" name="loginEmail" id="loginEmail" placeholder="Email address" required autofocus>
<input type="password" class="form-control" name="loginPass" id="loginPass" placeholder="Password" required>
<button class="btn btn-lg btn-primary btn-block" name="login_submit" id="login_submit" type="submit">Sign in</button>
</form>
In this code you can notice there is ID for email input box that is id=”loginEmail” and password input box that is id=”loginPass”
so by taking this ID we will use below two method of webBrowser control and fill the value of each input box using following code
webBrowser1.Document.GetElementById("loginEmail").InnerText =textBox1.Text.ToString(); webBrowser1.Document.GetElementById("loginPass").InnerText = textBox2.Text.ToString();
webBrowser1.Document.GetElementById("loginEmail").InnerText =textBox1.Text.ToString();
webBrowser1.Document.GetElementById("loginPass").InnerText = textBox2.Text.ToString();
After the value filled to Email and Password input box we will just call click event of submit button which is named as Sign In
webBrowser1.Document.GetElementById("login_submit").InvokeMember("click");
webBrowser1.Document.GetElementById("login_submit").InvokeMember("click");
So this is very basic example how you can login to website programatically when you need to access data that is available after login to website. This is very simple way in which you can work with Web Browser control but there are some other way as well using which you can do same thing.
Source: http://webdata-scraping.com/login-website-programmatically-using-c-web-scraping/
No comments:
Post a Comment