---
name: rdf-generation
description: Generate comprehensive RDF (Turtle or JSON-LD) from URLs or local files using multiple LLM providers (OpenAI, Claude, Gemini, Grok, Mistral, Ollama, LM Studio). Use this skill when users want to (1) Extract structured RDF/Knowledge Graph data from documents, (2) Convert web pages, PDFs, Word docs, Excel, PowerPoint, CSV, or text files to RDF format, (3) Upload generated RDF to a SPARQL endpoint like Virtuoso, (4) Generate schema.org compliant semantic data from unstructured content.
---

# RDF Generation Skill

Generate comprehensive RDF (Turtle or JSON-LD) from various document sources using any LLM provider, then optionally upload to a SPARQL endpoint or save to local files.

## Quick Start

```bash
# Basic usage with OpenAI (default)
python scripts/rdf_generation.py --source https://example.com --output-mode file

# Use Claude
python scripts/rdf_generation.py --source document.pdf --llm-provider claude --output-mode file

# Use Gemini (free tier available)
python scripts/rdf_generation.py --source document.pdf --llm-provider gemini --output-mode file

# Use local Ollama (completely free)
python scripts/rdf_generation.py --source document.pdf --llm-provider ollama --output-mode file
```

## Supported LLM Providers

| Provider | API Key Env Var | Default Model |
|----------|-----------------|---------------|
| `openai` | `OPENAI_API_KEY` | `gpt-4o-mini` |
| `claude` | `ANTHROPIC_API_KEY` | `claude-sonnet-4-20250514` |
| `gemini` | `GOOGLE_API_KEY` | `gemini-3-flash-preview` |
| `grok` | `XAI_API_KEY` | `grok-2` |
| `mistral` | `MISTRAL_API_KEY` | `mistral-large-latest` |
| `ollama` | None (local) | `llama2` |
| `lmstudio` | None (local) | `local-model` |

For detailed provider configuration, see [references/llm-providers.md](references/llm-providers.md).

## Supported Input Formats

- **URLs**: HTML web pages
- **PDF**: `.pdf` (requires `pypdf` or `pdfplumber`)
- **Word**: `.docx`, `.doc` (requires `python-docx`)
- **Excel**: `.xlsx`, `.xls` (requires `pandas`, `openpyxl`)
- **PowerPoint**: `.pptx`, `.ppt` (requires `python-pptx`)
- **CSV**: `.csv` (requires `pandas`)
- **Markdown**: `.md`, `.markdown`
- **HTML**: `.html`, `.htm`
- **Text**: `.txt`

## Output Modes

| Mode | Description |
|------|-------------|
| `file` | Save RDF to local file only |
| `sparql` | Upload to SPARQL endpoint (default) |
| `both` | Upload AND save to local file |

## Key Command-Line Options

```
--source URL_OR_FILE      Source document (URL or file path)
--llm-provider PROVIDER   LLM provider to use
--model MODEL             Override default model
--format turtle|jsonld    Output format (default: turtle)
--output-mode MODE        Output destination (sparql/file/both)
--output-file PATH        Specific output file path
--graph-iri IRI           Named graph IRI for SPARQL upload
--sparql-endpoint URL     SPARQL endpoint URL
--user / --password       SPARQL endpoint credentials
--prompt-file PATH        Custom prompt template file
--debug                   Enable debug output
--verify                  Verify SPARQL upload with test query
```

## SPARQL Upload Example

```bash
python scripts/rdf_generation.py \
    --source https://example.com \
    --llm-provider claude \
    --output-mode sparql \
    --sparql-endpoint http://localhost:8890/sparql \
    --graph-iri urn:my:knowledge:graph \
    --user dba \
    --password dba \
    --verify
```

## Custom Prompts

Create custom prompts with `{source_uri}` and `{document_text}` placeholders:

```bash
python scripts/rdf_generation.py \
    --source document.pdf \
    --prompt-file my-custom-prompt.txt \
    --output-mode file
```

**Critical**: Prompts MUST contain `{document_text}` placeholder or output will be hallucinated.

## Dependencies

Install all dependencies:

```bash
pip install requests beautifulsoup4 rdflib pypdf pdfplumber python-docx python-pptx pandas openpyxl
```

## Workflow

1. **Load source** - Extract text from URL or document
2. **Generate RDF** - Send to LLM with schema.org prompt template
3. **Validate** - Parse and validate generated RDF using rdflib
4. **Output** - Save to file and/or upload to SPARQL endpoint
