Optimizing Gherkin scenario step definitions

When writing test cases in Gherkin, the larger your set of scenario descriptions gets, the easier it is to create redundant step definitions.

Word order, using synonyms, all this leads to more and more step definitions to manage.

One tool to get an overview of redundancies is the following Python script. it search through a directory entered as a parameter and all its subdirectories, scans the files for step definitions and finally lists these with the file and line number alphabetically sorted in a text file.

Now a lot easier to get an overview of redundancies.

The Python script file gherkinheader.py

import os

import re

import sys

# check if the source directory is entered as parameter

if len(sys.argv) != 2:

print("Usage: python extract_gherkin_headers.py <root_directory>")

sys.exit(1)

# take the first parameter as sourc directory

root_dir = sys.argv[1]

# create a list to store the results

headers = []

# Regex-Pattern to identify Given/When/Then-Header

pattern = re.compile(r'\b(Given|When|Then)\s*\(.*\s*=>\s*\{')

# traverse the source diretory and all subdirectories

for subdir, _, files in os.walk(root_dir):

for file in files:

if file.endswith('.js'):

file_path = os.path.join(subdir, file)

withopen(file_path,'r',encoding='utf-8') as f:

lines = f.readlines()

for i, line inenumerate(lines,start=1):

match = pattern.search(line)

if match:

header = match.group(0)

# store Header, file name and line number

headers.append(f"{header}: {file}:{i}")

# remove duplicates and sort alphabetically

headers = sorted(set(headers))

# write sorted headers into file

with open("gherkin_headers.txt", 'w', encoding='utf-8') as output_file:

for header in headers:

output_file.write(header + '\n')

print(f"Found Given/When/Then-Header are written to 'gherkin_headers.txt'.")

The Python script is called like

python gherkinheader.py <sourcedirectory>