Python Script to Merge GitHub Repository Python Files into a Markdown File - eviltoast
import os
import re

def get_python_files(directory):
    python_files = []
    for root, dirs, files in os.walk(directory):
        for file in files:
            if file.endswith(".py"):
                python_files.append(os.path.join(root, file))
    return python_files

def read_file(file_path):
    with open(file_path, "r", encoding="utf-8") as file:
        contents = file.read()
    return contents

def write_markdown(file_paths, output_file):
    with open(output_file, "w", encoding="utf-8") as md_file:
        for file_path in file_paths:
            file_name = os.path.basename(file_path)
            md_file.write(f"`{file_name}`\n\n")
            md_file.write("```python\n")
            md_file.write(read_file(file_path))
            md_file.write("\n```\n\n")

def main():
    github_repo_path = input("Enter the path to the GitHub repository: ")
    python_files = get_python_files(github_repo_path)
    output_file = "merged_files.md"
    write_markdown(python_files, output_file)
    print(f"Python files merged into {output_file}")

if __name__ == "__main__":
    main()

Here’s how the script works:

  1. The get_python_files function takes a directory path and returns a list of all Python files (files ending with .py) found in that directory and its subdirectories.
  2. The read_file function reads the contents of a file and returns it as a string.
  3. The write_markdown function takes a list of file paths and an output file path. It iterates over the file paths, reads the contents of each file, and writes the file name and contents to the output file in the desired markdown format.
  4. The main function prompts the user to enter the path to the GitHub repository, calls the other functions, and outputs a message indicating that the Python files have been merged into the output file (merged_files.md).

To use the script, save it as a Python file (e.g., merge_python_files.py), and run it with Python. When prompted, enter the path to the GitHub repository you want to process. The script will create a merged_files.md file in the same directory containing the merged Python files in the requested format.

Note: This script assumes that the repository only contains Python files. If you want to include other file types or exclude certain files or directories, you may need to modify the get_python_files function accordingly.