# Introduction
- Puppeteer is a node.js library that achieves web crawling by an automated chromium browser (chrome open source project)
- Unlike other methods needing to send pure http requests and parse responses yourself, even needing to handle cookies manually, your syntax is simply:
- Launch browser -> emulate real user operations on the web page -> get any data that is reachable by a real user
Web crawling by puppeteer is very simple.
This is part 1/2, in part 2/2 I'll demonstrate how to make it work on Android
# Getting Started
Start by installing:- npm install puppeteer
Then, use a simple example_code.js as below to get data behind a logged in account by automatically entering account_name/password and logging in:
(async ()=>{
const puppeteer = require('puppeteer');
// Initiate the Puppeteer browser
// if headless is false, the chromium browser window will show up
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
try{
// Go to the login page's URL and sign in automatically.
await page.goto( "http://your.login.page/url", { waitUntil: "networkidle0" });
let data = await page.evaluate(() => {
Account_Field_ID_Name.value = "Your Account Name";
Password_Field_ID_Name.value = "Your Password";
// These codes run in the context of the page
// Same context when you right click on the page and press Inspect -> Console
// So you can test for the code first in your browser's Console
document.getElementsByName("Login_Button_Name")[0].click();
return "The data you want to return from the current page to your node.js app";
});
// Give the browser time to navigate to the logged in page (timeout: 30seconds)
await page.waitForNavigation();
console.log(data);
// This returns data from the page behind your logged in account
data = await page.evaluate(()=>{
return Some_Text_Fields_ID.value;
});
console.log(data);
}catch(e){console.log(e);}
// Uncomment this section if you've finished developing and want to close the browser in the end
// await browser.close();
})();
沒有留言:
張貼留言