This repository contains PowerShell scripts for processing PDF documents.
This script recursively finds PDF files in a specified directory and converts them into text-based formats.
Features:
- Converts PDFs to individual
.mdor.txtfiles. - Generates a frontmatter header in each file with metadata (source path, creation time, size, etc.).
- Optionally merges the content of all PDFs into a single file.
- Creates a
conversion.logfile in the output directory to track successes and failures.
Prerequisites:
- Requires the
PSWritePDFPowerShell module. Install it by running:Install-Module -Name PSWritePDF -Scope CurrentUser
Usage:
To get help and see examples, run:
Get-Help .\Convert-PdfTo-Text.ps1 -FullExamples:
-
Convert all PDFs to individual Markdown files: (Output will be in a new
outputfolder inside your target directory).\Convert-PdfTo-Text.ps1 -Path "C:\path\to\your\documents"
-
Merge all PDFs into a single file:
.\Convert-PdfTo-Text.ps1 -Path "C:\path\to\your\documents" -MergeFiles
-
Specify an output directory and format:
.\Convert-PdfTo-Text.ps1 -Path "C:\path\to\your\documents" -OutputDirectory "C:\my\output" -Format txt
This script recursively searches for a specific text string within all PDF files in a given directory.
Prerequisites:
- Requires the
PSWritePDFPowerShell module.
Usage:
To get help and see examples, run:
Get-Help .\Find-TextInPdf.ps1 -FullExample:
.\Find-TextInPdf.ps1 -Directory "C:\path\to\your\documents" -TextToSearch "your-text-here"This script enriches the front matter of markdown files with document content metadata using the Gemini API.
Features:
- Extracts key metadata from the document content, such as document type, dates, parties involved, etc.
- Extracts specific metadata from SIGER documents (
siger_numero_unico_de_documentoandsiger_fecha_de_inscripcion). - Updates the markdown file's front matter with the new metadata.
- Creates an
enrichment.logfile in the input directory to track successes and failures.
Prerequisites:
- Requires the Gemini CLI to be installed and configured.
Usage:
To get help and see examples, run:
Get-Help .\Enrich-FrontMatter.ps1 -FullExamples:
-
Enrich all markdown files in a directory:
.\Enrich-FrontMatter.ps1 -InputDirectory "C:\path\to\your\markdown_files"
-
Enrich files using a specific Gemini model:
.\Enrich-FrontMatter.ps1 -InputDirectory "C:\path\to\your\markdown_files" -GeminiModel "gemini-1.5-pro-latest"