When writing test cases in Gherkin, the larger your set of scenario descriptions gets, the easier it is to create redundant step definitions.

Word order, using synonyms, all this leads to more and more step definitions to manage.

One tool to get an overview of redundancies is the following Python script. it search through a directory entered as a parameter and all its subdirectories, scans the files for step definitions and finally lists these with the file and line number alphabetically sorted in a text file.

Now a lot easier to get an overview of redundancies.

The Python script file gherkinheader.py

import os
import re
import sys

# check if the source directory is entered as parameter
if len(sys.argv) != 2:
print("Usage: python extract_gherkin_headers.py <root_directory>")
sys.exit(1)

# take the first parameter as sourc directory
root_dir = sys.argv[1]

# create a list to store the results
headers = []

# Regex-Pattern to identify Given/When/Then-Header
pattern = re.compile(r'\b(Given|When|Then)\s*\(.*\s*=>\s*\{')

# traverse the source diretory and all subdirectories
for subdir, _, files in os.walk(root_dir):
for file in files:
if file.endswith('.js'):
file_path = os.path.join(subdir, file)
withopen(file_path,'r',encoding='utf-8') as f:
lines = f.readlines()
for i, line inenumerate(lines,start=1):
match = pattern.search(line)
if match:
header = match.group(0)
# store Header, file name and line number 
headers.append(f"{header}: {file}:{i}")

# remove duplicates and sort alphabetically
headers = sorted(set(headers))

# write sorted headers into file
with open("gherkin_headers.txt", 'w', encoding='utf-8') as output_file:
for header in headers:
output_file.write(header + '\n')

print(f"Found Given/When/Then-Header are written to 'gherkin_headers.txt'.")
 
The Python script is called like
 
python gherkinheader.py <sourcedirectory>