Extract, transform, and load your data efficiently for Large Language Model training and analytics
Extract text from .pdf and .txt files easily
Clean, tokenize, and chunk text for LLM-friendly datasets
Output to JSONL, CSV, or directly to databases
Generate semantic graphs from code directories
Run as a web service for programmatic or UI-driven ETL
Intuitive React frontend for easy job configuration
High-performance ETL engine and REST API
User-friendly web UI
Serves static UI and reverse-proxies API requests
Containerized and orchestratable
docker build -t anuragsingh086/goetl:latest .
docker run -p 8080:8080 -v $(pwd)/samples:/data anuragsingh086/goetl:latest
Web UI: http://localhost:8080
API: http://localhost:8080/api/etl
go run ./cmd/main.go -input samples/demo.pdf -output output/data.jsonl -format jsonl
POST /api/etl
{
"input": "/data/demo.pdf",
"output": "output/data.jsonl",
"chunksize": 200,
"overlap": 20,
"format": "jsonl"
}
We welcome contributions from the community!