Open Source Library

Polar Llama Documentation

A Python library for parallel LLM inference using Polars DataFrames

Version:
Overview

Polar Llama is a Python library that enables parallel inference calls to multiple Large Language Model providers through Polars dataframes. It streamlines batch processing of AI queries without serial request delays, making it ideal for data-intensive AI applications.

Concurrent Processing

Send multiple inference requests in parallel without waiting for individual completions

Polars Integration

Leverages efficient Polars dataframe operations for request management

Multi-turn Conversations

Supports context-preserving conversations across multiple message exchanges

Multiple Providers

Connects with OpenAI, Anthropic, Gemini, Groq, and AWS Bedrock models

NEW in 0.2.2

Embeddings & Vector Operations

Generate embeddings and perform vector similarity searches with ANN, KNN, and cosine similarity

NEW in 0.2.2

Cost Analytics

Track and calculate LLM inference costs to monitor and optimize your AI spending

NEW in 0.2.2

llama Namespace & Performance

New namespace organization for cleaner imports, plus Link-Time Optimization for faster execution

0.2.1

Taxonomy-based Tagging

Classify documents with custom taxonomies including detailed reasoning, reflection, and confidence scores

0.2.0

Structured Output Support

Native support for structured outputs with Pydantic models and JSON schema validation

Installation

Using pip

pip install polar-llama==0.2.2

Development Installation

maturin develop
Quick Start
Get started with a simple example
import polars as pl
from polar_llama import string_to_message, inference_async, Provider
import dotenv

# Load environment variables
dotenv.load_dotenv()

# Create a DataFrame with questions
questions = [
    'What is the capital of France?',
    'What is the difference between polars and pandas?',
    'Explain async programming in Python'
]

df = pl.DataFrame({'Questions': questions})

# Convert questions to LLM messages
df = df.with_columns(
    prompt=string_to_message("Questions", message_type='user')
)

# Run parallel inference
df = df.with_columns(
    answer=inference_async('prompt', provider=Provider.OPENAI,
                          model='gpt-4o-mini')
)

# Display results
print(df)

Embeddings & Vector Operations

NEW in 0.2.2
Overview
Generate embeddings and perform vector similarity searches at scale

Version 0.2.2 introduces powerful embedding and vector operations that enable semantic search, document similarity, and clustering capabilities. Generate embeddings from your text data and perform efficient similarity searches using industry-standard algorithms.

Embedding Generation

Generate vector embeddings from text using OpenAI, Cohere, or other embedding providers

Cosine Similarity

Calculate similarity scores between vectors for semantic matching

K-Nearest Neighbors (KNN)

Find the k most similar items using exact nearest neighbor search

Approximate NN (ANN)

Fast approximate similarity search for large-scale datasets

Generating Embeddings
Create vector embeddings from your text data
import polars as pl
from polar_llama import embed_async, Provider

# Create a DataFrame with text to embed
df = pl.DataFrame({
    "id": [1, 2, 3],
    "text": [
        "Machine learning is a subset of artificial intelligence",
        "Deep learning uses neural networks with many layers",
        "Natural language processing analyzes human language"
    ]
})

# Generate embeddings
df = df.with_columns(
    embedding=embed_async(
        "text",
        provider=Provider.OPENAI,
        model="text-embedding-3-small"
    )
)

print(df.select(["id", "text", "embedding"]))
Vector Similarity Search
Find similar documents using cosine similarity and KNN

Cosine Similarity

from polar_llama import cosine_similarity

# Calculate similarity between query and documents
query_embedding = [0.1, 0.2, 0.3, ...]  # Your query vector

df = df.with_columns(
    similarity=cosine_similarity("embedding", query_embedding)
)

# Sort by similarity to find most relevant documents
results = df.sort("similarity", descending=True).head(10)

K-Nearest Neighbors

from polar_llama import knn_search

# Find the 5 most similar documents
results = knn_search(
    df,
    query_embedding,
    embedding_col="embedding",
    k=5
)

print(results)

Approximate Nearest Neighbors (ANN)

from polar_llama import ann_search

# Fast approximate search for large datasets
results = ann_search(
    df,
    query_embedding,
    embedding_col="embedding",
    k=10,
    n_probes=10  # Trade-off between speed and accuracy
)

print(results)
Common Use Cases
Real-world applications of embeddings and vector search

Semantic Search

Find documents by meaning rather than keyword matching

Document Clustering

Group similar documents together automatically

Recommendation Systems

Suggest similar items based on content similarity

Duplicate Detection

Identify near-duplicate content in large datasets

Cost Analytics

NEW in 0.2.2
Overview
Track and optimize your LLM inference costs

The new cost analytics feature helps you monitor and manage your AI spending by calculating the cost of each inference call. Track costs per request, aggregate spending over time, and identify opportunities to optimize your LLM usage.

Per-Request Costs

Calculate the exact cost of each inference call based on token usage

Provider Pricing

Built-in pricing data for OpenAI, Anthropic, and other providers

Cost Aggregation

Sum up costs across batches, time periods, or custom groupings

Budget Monitoring

Set cost thresholds and track spending against budgets

Calculating Inference Costs
Track costs for your LLM requests
import polars as pl
from polar_llama import inference_async, calculate_cost, Provider

# Run inference and track costs
df = df.with_columns(
    answer=inference_async(
        'prompt',
        provider=Provider.OPENAI,
        model='gpt-4o-mini',
        return_usage=True  # Enable usage tracking
    )
)

# Calculate cost for each request
df = df.with_columns(
    cost=calculate_cost(
        "answer",
        provider=Provider.OPENAI,
        model='gpt-4o-mini'
    )
)

# Get total cost
total_cost = df.select(pl.col("cost").sum()).item()
print(f"Total inference cost: ${total_cost:.4f}")

# Cost breakdown by category
cost_by_category = df.group_by("category").agg(
    pl.col("cost").sum().alias("total_cost"),
    pl.col("cost").mean().alias("avg_cost"),
    pl.col("cost").count().alias("request_count")
)
print(cost_by_category)

Taxonomy-based Tagging

0.2.1
Overview
Classify documents according to a custom taxonomy with detailed reasoning and confidence scores

Taxonomy-based tagging is a powerful feature that allows you to classify documents according to a custom taxonomy with detailed reasoning, reflection, and confidence scores. This feature is particularly useful for content classification, customer support routing, email triage, sentiment analysis, and multi-label classification.

Detailed Reasoning

For each possible value in each field, the model provides its reasoning

Reflection

After considering all options, the model reflects on its analysis

Confidence Scores

Each classification includes a confidence score (0.0 to 1.0)

Parallel Processing

Multiple documents and fields are processed in parallel automatically

Quick Start Example
Get started with taxonomy tagging in just a few lines of code
import polars as pl
from polar_llama import tag_taxonomy, Provider

# Define your taxonomy
taxonomy = {
    "sentiment": {
        "description": "The emotional tone of the text",
        "values": {
            "positive": "Text expresses positive emotions or favorable opinions",
            "negative": "Text expresses negative emotions or unfavorable opinions",
            "neutral": "Text is factual and objective without clear emotional content"
        }
    },
    "urgency": {
        "description": "How urgent the content is",
        "values": {
            "high": "Requires immediate attention",
            "medium": "Should be addressed soon",
            "low": "Can be addressed at any time"
        }
    }
}

# Create a dataframe
df = pl.DataFrame({
    "id": [1, 2],
    "message": [
        "URGENT: Server is down!",
        "Thanks for your help yesterday."
    ]
})

# Apply taxonomy tagging
result = df.with_columns(
    tags=tag_taxonomy(
        pl.col("message"),
        taxonomy,
        provider=Provider.GROQ,
        model="openai/gpt-oss-120b"
    )
)

# Extract specific values
result.select([
    "message",
    pl.col("tags").struct.field("sentiment").struct.field("value").alias("sentiment"),
    pl.col("tags").struct.field("sentiment").struct.field("confidence").alias("confidence"),
    pl.col("tags").struct.field("urgency").struct.field("value").alias("urgency")
])
Defining a Taxonomy
Learn how to create effective taxonomy definitions

A taxonomy is defined as a dictionary with the following structure:

taxonomy = {
    "field_name": {
        "description": "What this field represents",
        "values": {
            "value1": "Definition of value1",
            "value2": "Definition of value2",
            # ... more values
        }
    },
    # ... more fields
}

Design Tips

  • Clear Definitions: Make value definitions specific and mutually exclusive
  • Appropriate Granularity: 3-5 values per field works well; too many can confuse the model
  • Balanced Options: Try to provide balanced options that cover the full range
  • Domain-Specific: Tailor definitions to your specific use case
Output Structure
Understanding the structured output format

Each tagged document returns a Struct with the following nested structure:

{
    "field_name": {
        "thinking": {
            "value1": "Reasoning about why value1 might apply...",
            "value2": "Reasoning about why value2 might apply...",
            # ... reasoning for each possible value
        },
        "reflection": "Overall reflection on the analysis of this field...",
        "value": "selected_value",  # The chosen value
        "confidence": 0.87  # Confidence score (0.0 to 1.0)
    },
    # ... more fields
}
thinking:

A dictionary with reasoning for each possible value in the taxonomy

reflection:

The model's overall reflection after considering all options

value:

The selected value (one of the values from the taxonomy)

confidence:

How confident the model is in its selection (0.0 = not confident, 1.0 = very confident)

Accessing Results
How to extract and work with taxonomy tags

Extract Specific Fields

# Get just the selected value
sentiment = result_df.select(
    pl.col("tags").struct.field("sentiment").struct.field("value")
)

# Get value and confidence together
sentiment_analysis = result_df.select([
    pl.col("tags").struct.field("sentiment").struct.field("value").alias("sentiment"),
    pl.col("tags").struct.field("sentiment").struct.field("confidence").alias("confidence")
])

Access Detailed Reasoning

# Get the thinking for a specific field
thinking = result_df.select(
    pl.col("tags").struct.field("sentiment").struct.field("thinking")
)

# Get the reflection
reflection = result_df.select(
    pl.col("tags").struct.field("sentiment").struct.field("reflection")
)

Multiple Fields at Once

# Create a clean summary view
summary = result_df.select([
    "id",
    "document",
    pl.col("tags").struct.field("sentiment").struct.field("value").alias("sentiment"),
    pl.col("tags").struct.field("urgency").struct.field("value").alias("urgency"),
    pl.col("tags").struct.field("category").struct.field("value").alias("category")
])
Advanced Usage
Advanced patterns for filtering and aggregation

Filtering by Confidence

# Only keep high-confidence results
high_confidence = result_df.filter(
    pl.col("tags").struct.field("sentiment").struct.field("confidence") > 0.8
)

Combining Multiple Conditions

# Find negative, urgent items with high confidence
critical = result_df.filter(
    (pl.col("tags").struct.field("sentiment").struct.field("value") == "negative") &
    (pl.col("tags").struct.field("urgency").struct.field("value") == "high") &
    (pl.col("tags").struct.field("urgency").struct.field("confidence") > 0.7)
)

Aggregating by Category

# Count documents by sentiment
sentiment_counts = result_df.groupby(
    pl.col("tags").struct.field("sentiment").struct.field("value")
).count()

# Average confidence by category
avg_confidence = result_df.groupby(
    pl.col("tags").struct.field("category").struct.field("value")
).agg(
    pl.col("tags").struct.field("category").struct.field("confidence").mean()
)
Common Use Cases
Real-world applications of taxonomy-based tagging

1. Customer Support Routing

taxonomy = {
    "department": {
        "description": "Which department should handle this",
        "values": {
            "sales": "Product inquiries and purchases",
            "support": "Technical issues and bugs",
            "billing": "Payment and account questions"
        }
    },
    "priority": {
        "description": "How urgent this is",
        "values": {
            "urgent": "Service down or critical issue",
            "high": "Significant problem affecting work",
            "normal": "Standard request or question"
        }
    }
}

2. Content Classification

taxonomy = {
    "category": {
        "description": "Main topic area",
        "values": {
            "technology": "Tech, software, or digital topics",
            "business": "Business, finance, or economics",
            "lifestyle": "Health, wellness, or personal topics"
        }
    },
    "content_type": {
        "description": "Format and purpose",
        "values": {
            "tutorial": "Step-by-step instructional content",
            "analysis": "In-depth examination of a topic",
            "news": "Timely reporting of events"
        }
    }
}

3. Social Media Analysis

taxonomy = {
    "sentiment": {
        "description": "Emotional tone",
        "values": {
            "positive": "Positive emotions or opinions",
            "negative": "Negative emotions or criticism",
            "neutral": "Factual without clear emotion"
        }
    },
    "topic": {
        "description": "Main subject discussed",
        "values": {
            "product": "Discussion of product features",
            "service": "Customer service experience",
            "brand": "General brand perception"
        }
    },
    "intent": {
        "description": "What the author wants",
        "values": {
            "complaint": "Expressing dissatisfaction",
            "praise": "Sharing positive experience",
            "question": "Seeking information"
        }
    }
}
API Reference
tag_taxonomy() function signature and parameters
def tag_taxonomy(
    expr: IntoExpr,
    taxonomy: Dict[str, Dict[str, Any]],
    *,
    provider: Optional[Union[str, Provider]] = None,
    model: Optional[str] = None,
) -> pl.Expr
expr:

The document expression to analyze and tag

taxonomy:

Dictionary defining the taxonomy structure

provider:

The LLM provider to use (OpenAI, Anthropic, Gemini, Groq, Bedrock)

model:

The specific model name to use

Returns:

Polars Expression with structured tags as a Struct column

Best Practices
Tips for effective taxonomy-based tagging
  • 1
    Start Simple:

    Begin with 2-3 fields and expand as needed

  • 2
    Test Definitions:

    Verify that your value definitions are clear and distinguishable

  • 3
    Use Confidence Scores:

    Filter or flag low-confidence results for review

  • 4
    Validate Results:

    Spot-check classifications to ensure quality

  • 5
    Iterate:

    Refine your taxonomy based on results

  • 6
    Handle Errors:

    Always check for and handle error cases

Structured Outputs

0.2.0
What are Structured Outputs?
Get type-safe, validated responses from LLMs in a predictable format

Structured outputs allow you to define the exact schema you want the LLM to follow, ensuring responses are properly formatted and can be directly used in your data pipelines. This is perfect for extracting specific information, generating consistent data, or integrating LLM outputs with databases and APIs.

Type Safety

Define your output schema with Pydantic models for guaranteed type correctness

Validation

Automatic validation ensures responses match your schema before processing

Consistency

Get predictable, parseable outputs across all your inference requests

Easy Integration

Seamlessly integrate with databases, APIs, and data processing pipelines

Basic Structured Output
Define a simple schema and get structured responses
import polars as pl
from polar_llama import string_to_message, inference_async, Provider
from pydantic import BaseModel

# Define your output schema
class ProductInfo(BaseModel):
    name: str
    price: float
    category: str
    in_stock: bool

# Create prompts
prompts = [
    "Extract product info: iPhone 15 Pro for $999 in Electronics, available",
    "Extract product info: Nike Air Max shoes for $129.99 in Footwear, sold out",
    "Extract product info: Laptop Stand for $49.99 in Accessories, in stock"
]

df = pl.DataFrame({'prompt': prompts})

# Convert to messages
df = df.with_columns(
    message=string_to_message("prompt", message_type='user')
)

# Run inference with structured output
df = df.with_columns(
    product=inference_async(
        'message',
        provider=Provider.OPENAI,
        model='gpt-4o-2024-08-06',
        response_model=ProductInfo  # Specify your Pydantic model
    )
)

# Access structured fields directly
print(df.select(['product']))

Examples & Cookbooks

Multi-Message Conversations
Maintain context across multiple messages for more natural interactions
import polars as pl
from polar_llama import string_to_message, inference_async

# Create a DataFrame with system prompts and user questions
df = pl.DataFrame({
    "system_prompt": [
        "You are a helpful assistant.",
        "You are a math expert.",
        "You are a creative writer."
    ],
    "user_question": [
        "What's the weather like today?",
        "Solve x^2 + 5x + 6 = 0",
        "Write a haiku about coding"
    ]
})

# Convert both columns to messages
df = df.with_columns([
    string_to_message("system_prompt", message_type="system").alias("system_message"),
    string_to_message("user_question", message_type="user").alias("user_message")
])

# Combine messages into conversations
from polar_llama import combine_messages, inference_messages
df = df.with_columns(
    combine_messages("system_message", "user_message").alias("conversation")
)

# Run inference with combined messages
df = df.with_columns(
    inference_messages("conversation",
           provider="openai",
           model="gpt-4").alias("response")
)

print(df.select(["user_question", "response"]))
Data Analysis Pipeline
Process customer feedback at scale
import polars as pl
from polar_llama import string_to_message, inference_async, Provider

# Load customer feedback data
feedback_df = pl.DataFrame({
    'customer_id': [101, 102, 103, 104, 105],
    'feedback': [
        'The product is amazing but shipping was slow',
        'Great quality, highly recommend!',
        'Disappointed with customer service',
        'Perfect for my needs, will buy again',
        'Product arrived damaged, requesting refund'
    ]
})

# Create sentiment analysis prompts
sentiment_prompt = """Analyze the sentiment of this customer feedback
and classify it as Positive, Negative, or Neutral.
Also provide a brief reason.

Feedback: {feedback}"""

df = feedback_df.with_columns(
    prompt=pl.format(sentiment_prompt, pl.col('feedback'))
)

# Convert to messages and run inference
df = df.with_columns(
    message=string_to_message("prompt", message_type='user')
)

df = df.with_columns(
    sentiment_analysis=inference_async('message',
                                      provider=Provider.OPENAI,
                                      model='gpt-4o-mini')
)

# Extract key insights
print(df.select(['customer_id', 'feedback', 'sentiment_analysis']))
Provider Support
Polar Llama supports multiple LLM providers

OpenAI

df = products.with_columns(
    prompt=pl.format(prompt_template,
                    pl.col('product_name'),
                    pl.col('features'),
                    pl.col('target_audience'))
)

Structured OutputsSupported on gpt-4o-2024-08-06 and later models with response_model parameter

Anthropic (Claude)

df = df.with_columns(
    answer=inference_async('prompt',
                          provider=Provider.ANTHROPIC,
                          model='claude-3-haiku-20240307')
)

Structured OutputsSupported on Claude 3.5 Sonnet and later with response_model parameter

AWS Bedrock

# Requires AWS credentials configured
df = df.with_columns(
    answer=inference_async('prompt',
                          provider='bedrock',
                          model='anthropic.claude-3-haiku-20240307-v1:0')
)

Google Gemini

df = df.with_columns(
    answer=inference_async('prompt',
                          provider=Provider.GEMINI,
                          model='gemini-pro')
)

Groq

df = df.with_columns(
    answer=inference_async('prompt',
                          provider=Provider.GROQ,
                          model='llama3-70b-8192')
)
Advanced Features

Environment Configuration

Set up your API keys in a .env file:

OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
GEMINI_API_KEY=your_gemini_key
GROQ_API_KEY=your_groq_key
AWS_ACCESS_KEY_ID=your_aws_key
AWS_SECRET_ACCESS_KEY=your_aws_secret
AWS_REGION=us-east-1

Testing

Run tests with configured providers:

pip install -r tests/requirements.txt
pytest tests/ -v
cargo test --test model_client_tests -- --nocapture
Common Use Cases

Data Analysis

Process large datasets with AI insights - sentiment analysis, classification, entity extraction with validated structured outputs

Content Generation

Generate product descriptions, marketing copy, or documentation at scale with consistent formatting

Research & Summarization

Summarize documents, extract key points with structured metadata, or answer questions about large text corpora

Automation

Automate repetitive AI tasks like code review, email categorization, or data enrichment with type-safe outputs

Licensed under MIT

Questions or issues? Open an issue on GitHub

David Drummond