Tagging MCP
MCP server for tagging CSV rows using polar_llama with parallel LLM inference.
Overview
This MCP server enables fast, parallel tagging of CSV data using multiple LLM providers. It leverages polar_llama to process rows concurrently, making it ideal for batch classification and tagging tasks.
Features
- Parallel Processing: Tag hundreds or thousands of CSV rows concurrently
- Multiple LLM Providers: Support for Claude (Anthropic), OpenAI, Gemini, and Groq
- Structured Output: Uses Pydantic models for consistent, type-safe results
- Flexible Taxonomy: Define custom tag lists for your use case
- Optional Reasoning: Include confidence levels and explanations for tags
Installation
Prerequisites
- Python 3.12+
- UV package manager
- API key for at least one LLM provider
Environment Setup
- Clone this repository
- Create a
.envfile with your API keys:
ANTHROPIC_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here
GEMINI_API_KEY=your_key_here
GROQ_API_KEY=your_key_here
Claude Desktop Configuration
Option 1: Local Development (Recommended)
Run directly without containers:
{
"mcpServers": {
"tagging-mcp": {
"command": "uv",
"args": ["run", "fastmcp", "run", "/path/to/tagging_mcp/tagging.py"]
}
}
}
Option 2: Container Deployment
Build the container:
container build -t tagging_mcp .
Configure Claude Desktop:
{
"mcpServers": {
"tagging-mcp": {
"command": "container",
"args": ["run", "--interactive", "tagging_mcp"]
}
}
}
Available Tools
tag_csv
Simple tagging with a list of categories. Perfect for basic classification tasks.
Parameters:
csv_path(str): Path to the CSV file to tagtaxonomy(List[str]): List of possible tags/categories (e.g., ["technology", "business", "science"])text_column(str, optional): Column containing text to analyze (default: "text")provider(str, optional): LLM provider - "claude", "openai", "gemini", "groq", or "bedrock" (default: "groq")model(str, optional): Model identifier (default: "llama-3.3-70b-versatile")api_key(str, optional): API key if not set via environment variableoutput_path(str, optional): Path to save tagged CSVinclude_reasoning(bool, optional): Include detailed reasoning and reflection (default: false)field_name(str, optional): Name for the classification field (default: "category")
Returns: Dictionary with status, tagged data preview, confidence scores, and optional errors
tag_csv_advanced
Advanced multi-dimensional classification with custom taxonomy definitions. Use this for complex tagging with multiple fields.
Parameters:
csv_path(str): Path to the CSV file to tagtaxonomy(Dict): Full taxonomy dictionary with field definitions and value descriptionstext_column(str, optional): Column containing text to analyze (default: "text")provider(str, optional): LLM provider (default: "groq")model(str, optional): Model identifier (default: "llama-3.3-70b-versatile")api_key(str, optional): API key if not set via environment variableoutput_path(str, optional): Path to save tagged CSVinclude_reasoning(bool, optional): Include detailed reasoning (default: false)
Example Taxonomy:
{
"sentiment": {
"description": "The emotional tone of the text",
"values": {
"positive": "Text expresses positive emotions or favorable opinions",
"negative": "Text expresses negative emotions or unfavorable opinions",
"neutral": "Text is factual and objective"
}
},
"urgency": {
"description": "How urgent the content is",
"values": {
"high": "Requires immediate attention",
"medium": "Should be addressed soon",
"low": "Can be addressed at any time"
}
}
}
Returns: Dictionary with status, all field values, confidence scores per field, and optional reasoning
preview_csv
Preview the first few rows of a CSV file to understand its structure.
Parameters:
csv_path(str): Path to the CSV filerows(int, optional): Number of rows to preview (default: 5)
Returns: Dictionary with columns, row count, and preview data
get_tagging_info
Get information about the tagging MCP server and supported providers.
Returns: Server metadata, supported providers, features, and available tools
Example Usage
Basic Tagging
-
Preview your CSV:
- Use
preview_csvwithcsv_path="/path/to/data.csv"
- Use
-
Simple category tagging:
- Use
tag_csvwith:csv_path="/path/to/data.csv"taxonomy=["technology", "business", "science", "politics"]text_column="description"output_path="/path/to/tagged_output.csv"
- Use
-
Include reasoning for transparency:
- Use
tag_csvwith:csv_path="/path/to/data.csv"taxonomy=["urgent", "normal", "low_priority"]field_name="priority"include_reasoning=true
- Use
Advanced Multi-Field Tagging
For complex classification with multiple dimensions:
Use tag_csv_advanced with:
{
"csv_path": "/path/to/support_tickets.csv",
"taxonomy": {
"department": {
"description": "Which department should handle this",
"values": {
"sales": "Product inquiries and purchases",
"support": "Technical issues and bugs",
"billing": "Payment and account questions"
}
},
"priority": {
"description": "How urgent this is",
"values": {
"urgent": "Service down or critical issue",
"high": "Significant problem",
"normal": "Standard request"
}
}
},
"text_column": "ticket_description",
"output_path": "/path/to/classified_tickets.csv"
}
Output Structure
Basic Tagging Output
- Original CSV columns
{field_name}: The selected tagconfidence: Confidence score (0.0 to 1.0)thinking: Reasoning for each possible value (ifinclude_reasoning=true)reflection: Overall analysis (ifinclude_reasoning=true)
Advanced Tagging Output
- Original CSV columns
- For each taxonomy field:
{field_name}: Selected value{field_name}_confidence: Confidence score{field_name}_thinking: Reasoning dict (if enabled){field_name}_reflection: Analysis (if enabled)
Supported LLM Providers
-
Groq (Recommended):
- Tool-use optimized: llama-3-groq-70b-tool-use (90.76% BFCL accuracy, #1 ranked), llama-3-groq-8b-tool-use (89.06% BFCL accuracy)
- Compound AI: groq/compound (multi-tool), groq/compound-mini (single-tool, 3x faster)
- GPT-OSS models: gpt-oss-120b (120B params, reasoning), gpt-oss-20b (21B MoE, 3.6B active), gpt-oss-safeguard-20b (safety)
- Kimi-K2: kimi-k2-instruct (1T params, 32B active, 256k context), kimi-k2-0905 (improved coding & tool calling)
- Llama 4: llama-4-scout (17Bx16MoE, 128k context), llama-4-maverick (17Bx128E, 128k context)
- Llama 3.x: llama-3.3-70b-versatile, llama-3.1-8b-instant, llama3-70b-8192, llama3-8b-8192
- Reasoning: deepseek-r1-distill-llama-70b (128k context), qwen-qwq-32b
- Vision: llama-3.2-11b-vision-preview, llama-3.2-90b-vision-preview (up to 5 images)
- Other: gemma2-9b-it, qwen3-32b, mixtral-8x7b-32768
-
OpenAI:
- GPT-5 (Latest): gpt-5 (full reasoning, freeform tool calling), gpt-5-mini (balanced), gpt-5-nano (low-cost/latency)
- GPT-4.x: gpt-4.5, gpt-4o, gpt-4o-mini
- Legacy: gpt-4-turbo, gpt-3.5-turbo
-
Claude (Anthropic):
- Latest (2025): claude-sonnet-4-5, claude-opus-4-1, claude-haiku-4-5, claude-sonnet-4
- Legacy: claude-3-5-sonnet-20241022, claude-3-opus-20240229
-
Gemini (Google):
- Latest (2025): gemini-2.0-flash (1M token context, simplified function calling)
- Current: gemini-1.5-pro, gemini-1.5-flash
-
AWS Bedrock:
- Claude 4 family: claude-sonnet-4-5, claude-opus-4-1, claude-haiku-4-5, claude-sonnet-4
- Legacy: anthropic.claude-3-sonnet, anthropic.claude-3-haiku
Key Features
- ✨ Detailed Reasoning: For each tag, see why the model chose it
- 🔍 Reflection: Model reflects on its analysis
- 📊 Confidence Scores: Know how confident each classification is (0.0-1.0)
- ⚡ Parallel Processing: All rows processed concurrently
- 🎯 Error Detection: Automatic error tracking and reporting
- 🔧 Flexible: Simple list or complex multi-field taxonomies