Codehike Scrollycoding Example

AgentQL is a robust query language that identifies elements on a webpage using natural language with the help of AI. It can be used to automate tasks on the web, extract data, and interact with websites in real-time. AgentQL uses AI to infer which element or data you mean based on term names and query structure, so it will find what you're looking for even if the page's markup and layout change drastically.

AgentQL's Python SDK allows you to write Python scripts that identify elements and extract data from the web using the AgentQL query language. In this guide, you will learn how to use AgentQL queries and the SDK to automate page interactions and data extraction from the page.

Prerequisites

Instructions

The script below will open a browser and do the following:

  1. Navigate to scrapeme.live/shop.
  2. Input "fish" into the search field in header section.
  3. Press "Enter" key to perform the search.
  4. Close the the browser after 10 seconds.

Step 0: Create a New Python Script

In your project folder, create a new Python script and name it example_script.py.

Step 1: Import Required Libraries

Import needed functions and classes from playwright library and import the agentql library.

Playwright is an end-to-end automation and testing tool that can be used for automation. In this example, it manages open the browser and interacting with the elements AgentQL returns.

Step 2: Launch the Browser and Open the Website

The last preparation step is launching the browser and navigating to the target website. This is done using usual Playwright's API. The only difference is the type of the page — instead of Playwright's Page class, it will be wrapped with agentql.wrap(), and you will get AgentQL's Page class that will be the main interface not only for interacting with the web page but also for executing AgentQL queries.

tip
  • Default AgentQL SDK implementation is built on top of Playwright and uses all of its functionality for interacting with browser, page and elements on the page.
  • By default Playwright launches the browser in headless mode. Here it is explicitly set to False for the sake of example.

Step 3: Define AgentQL Query

AgentQL queries are how you query elements 'from' a web page. A query describes the elements you want to interact with or consume content from and defines your desired output structure.

In this query, we specify the element we want to interact with on "https://scrapeme.live/shop/":

  • search_box - search input field
info

Step 4: Execute AgentQL Query

AgentQL's Page extends Playwright's Page class with querying capabilities.

response variable will have the same structure as defined by the given AgentQL query, i.e. it will have 1 field: search_box. This field will either be None if described element was not found on the page, or an instance of Locator class that allows you to interact with the found element.

Step 5: Interact with Web Page

This line uses the type method on the search_box element found in the previous step. It mimics typing "fish" into the search box. Here, the Enter method is called on the keyboard attribute of the page, simulating a press on the Enter key.

info

Step 6: Pause the Script Execution

Here, page.wait_for_timeout() method is used to pause the execution for 10 seconds to see the effect of this script before closing the browser.

warning

page.wait_for_timeout() is used only for demo purposes and will impact the performance. Don't use it in production!

Step 7: Stop the Browser

Finally, the close method is called on the browser object, ending the web browsing session. This is important for properly releasing resources.

1
import agentql
2
from playwright.sync_api import sync_playwright
3
4
with sync_playwright() as playwright, playwright.chromium.launch(headless=False) as browser:
5
page = agentql.wrap(browser.new_page())
6
page.goto("https://scrapeme.live/shop/")
7
8
QUERY = """
9
{
10
search_box
11
}
12
"""
13
14
response = page.query_elements(QUERY)
15
16
response.search_box.type("fish")
17
page.keyboard.press("Enter")
18
19
page.wait_for_timeout(10000)
20
21
browser.close()

Step 8: Run the Script

Open a terminal in your project's folder and run the script:

terminal
python3 example_script.py