500

🤖 AI Driven Test Automation with Playwright and Midscene

Let's explore AI Driven test execution, with Midscene and OpenAI - and find out what is good, bad and uncertain, about the future of test automation

Introduction

There are quite a few AI solutions on the market, mostly are paid or require a subscription. I wanted to explore some open-source solutions - I stumbled across Midscene on GitHub and decided to trial it out. Auto-Playwright is also fairly popular and I'll most likely explore that in the future as a comparison.

Why open-source? For me, I like the fact that you're only paying for the OpenAI calls being made, there's no additional charges. So you're in full control of your spending and usage.

The tech.

You can use the above resources to get started with Midscene, it supports integration with Playwright and Puppeteer.

You will also need an OpenAI API Key, and token credits. You can make a deposit in the billing section of your OpenAI settings.

AI Automation in action

So let's start with how it looks - this is a html recording which is auto-generated at the end of the test run. You can see the 'vision' of the AI tooling, and the interactions it is making with the page. There is also a nice menu bar on the left side of the page where you can step through the execution and see more information.

test

Let's see the code

If you clone the Midscene Playwright project, you'll have everything you need to get started.

Add a .env file to your project, with the reference to your OpenAI API Key.

# .env file
OPENAI_API_KEY=" "

These are the 4 ai actions that can be used:

.aiAction(steps: string) or .ai(steps: string) - Interact with the page
.aiQuery(dataDemand: any) - extract any data from page
.aiAssert(assertion: string, errorMsg?: string) - do an assertion
.aiWaitFor(assertion: string, {timeoutMs?: number, checkIntervalMs?: number }) - wait until the assertion is met

The test code is fairly simple, we're going to BBC Sports and asserting that Premier League matches are visible. The ai command allows us to interact with the page, and aiquery allows as to extract data from the page.

import { expect } from "@playwright/test";
import { test } from "./fixture";
 
test.beforeEach(async ({ page }) => {
  page.setViewportSize({ width: 1280, height: 800 });
  await page.goto("https://www.bbc.co.uk/sport");
});
 
test("find and output all premier league matches", async ({
  ai,
  aiQuery,
}) => {
  await ai('click the football tab which is next to "Home"');
 
  await ai('click "Scores & Fixtures" which is under the sports tabs');
 
  await ai('scroll the page until all "Premier League" games are in view underneath the "Premier League" heading, stop when another group or league is visible');
 
  const matches = await aiQuery(
    "string[], find all 'Premier League' matches on the page and the scores, output them in the match format"
  );
 
  console.log("these are the team names and scores", matches);
  expect(matches?.length).toBeGreaterThan(0);
});
 

This is the console output for the test, you can see the log for aiQuery. The test took 55.2 seconds to execute, which is pretty slow. I experimented with plain Playwright code, and it took around 5-10 seconds.

these are the team names and scores [
  'Chelsea 3 - 0 Aston Villa',
  'Manchester United 4 - 0 Everton',
  'Tottenham Hotspur 1 - 1 Fulham',
  'Liverpool 2 - 0 Manchester City'
]
 
Midscene - report file updated: C:\Users\garyp\git\midscene-example\playwright-demo\midscene_run\report\playwright-merged-2024-12-01_19-07-48-680.  Slow test file: [chromium] › bbc.spec.ts (51.8s)
  Consider splitting slow test files to speed up parallel execution
  1 passed (55.2s)
Midscene - report file updated: C:\Users\garyp\git\midscene-example\playwright-demo\midscene_run\report\playwright-merged-2024-12-01_19-07-48-680.html

The good

  • When you nail down the quality of your prompts, writing tests is very easy
  • You only need a foundation level of coding/automation experience to write tests
  • You can mix ai code with your existing framework/tests
  • The reporting is awesome and seeing your test execution is like magic

The bad

  • It is slow - If your test usually takes 5-10 seconds, expect it to take 60-100 seconds with ai
  • As your user journey becomes more complex, your test details become more complex. (there were also some scenarios that didn't work and I had to dumb down the test)
  • You are fully dependant on OpenAI connectivity, running tests in other environments or infrastructure could become a problem
  • If the UI/journey changes, you will have to rewrite your test structure and sentence structure

The uncertain

  • Your test is only as good as your ability to write good prompts. If you're bad at structuring prompts, you're going to have a bad time - tests will not work, or will be very flaky
  • Tests are no longer 'free' to execute, although fairly cheap. From my test executions it cost about $0.07 per run
  • I felt that I had to keep checking what the test was actually doing, because of the ambiguity in 'code'
  • I'm not sure how much time was actually saved - for sure there was a time saving in writing the test, but having to focus on detailed prompts, review execution and debug, all took additional time

Conclusion

It's definitely an exciting time for the test automation industry and I'm looking forward to the future Improvements that are made - although for now, I don't think I could justify migrating from a code-based framework to an ai-driven framework.