When writing test cases in Gherkin, the larger your set of scenario descriptions gets, the easier it is to create redundant step definitions.
Word order, using synonyms, all this leads to more and more step definitions to manage.
One tool to get an overview of redundancies is the following Python script. it search through a directory entered as a parameter and all its subdirectories, scans the files for step definitions and finally lists these with the file and line number alphabetically sorted in a text file.
Now a lot easier to get an overview of redundancies.
The Python script file gherkinheader.py
import osimport reimport sys# check if the source directory is entered as parameterif len(sys.argv) != 2:print("Usage: python extract_gherkin_headers.py <root_directory>")sys.exit(1)# take the first parameter as sourc directoryroot_dir = sys.argv[1]# create a list to store the resultsheaders = []# Regex-Pattern to identify Given/When/Then-Headerpattern = re.compile(r'\b(Given|When|Then)\s*\(.*\s*=>\s*\{')# traverse the source diretory and all subdirectoriesfor subdir, _, files in os.walk(root_dir):for file in files:if file.endswith('.js'):file_path = os.path.join(subdir, file)withopen(file_path,'r',encoding='utf-8') as f:lines = f.readlines()for i, line inenumerate(lines,start=1):match = pattern.search(line)if match:header = match.group(0)# store Header, file name and line number headers.append(f"{header}: {file}:{i}")# remove duplicates and sort alphabeticallyheaders = sorted(set(headers))# write sorted headers into filewith open("gherkin_headers.txt", 'w', encoding='utf-8') as output_file:for header in headers:output_file.write(header + '\n')print(f"Found Given/When/Then-Header are written to 'gherkin_headers.txt'.")The Python script is called like
python gherkinheader.py <sourcedirectory>