Paperine CLI

A standalone command-line research assistant with enhanced PDF processing and AI-powered analysis

Access ArXiv papers with multi-layered PDF extraction, mathematical symbol preservation, and AI-powered summarization - all without requiring a backend server or external dependencies.

🔍 ArXiv Search & Analysis

📄 Enhanced PDF Processing

∑ Mathematical Symbol Support

🤖 AI Summarization

🚀 Key Features

Standalone Operation

No backend server required - everything runs locally on your machine for maximum privacy and control.

ArXiv Integration

Direct access to academic papers via ArXiv API with intelligent search and filtering capabilities.

AI-Powered Analysis

Generate intelligent summaries using Watsonx AI for comprehensive paper understanding.

Enhanced PDF Processing

Advanced multi-layered PDF extraction with mathematical symbol preservation and Unicode normalization.

Cross-Platform

Fast, safe, and reliable operation on macOS, Linux, and Windows with optimized performance.

Multiple Formats

Output results in pretty, JSON, or compact formats to suit your workflow needs.

📄 Enhanced PDF Processing

Multi-Layered Extraction

Primary Extraction (pdf-extract)

Fast, reliable extraction for standard PDF content

Secondary Extraction (lopdf)

Alternative method for complex or problematic PDFs

Mathematical Fallback

Specialized extraction with enhanced symbol recognition

Mathematical Symbol Support

Greek Letters

α β γ δ ε θ λ μ π σ φ ψ ω

Mathematical Operators

∑ ∫ ∂ ∇ ∞ ± ≤ ≥ ≠ ≈ ∈ ∉

Set Theory & Logic

∪ ∩ ∅ ℝ ℕ ℤ ℚ ℂ → ⇒ ∀ ∃

Comprehensive Unicode normalization ensures proper handling of mathematical notation in academic papers.

🏗️ Architecture

Core Components

CLI Interface: Clap-powered command-line parsing
External APIs: Direct ArXiv and Wikipedia integration
PDF Processing: Multi-layered extraction (pdf-extract, lopdf) with mathematical fallback
Document Processing: Local DOCX, TXT, MD parsing with Unicode normalization
AI Integration: Watsonx-powered summarization
Configuration: TOML-based settings management

Technical Stack

Async Runtime Web Framework Frontend UI Watsonx AI Local Storage

Built with modern technologies for performance, safety, and reliability.

🚀 Quick Start

Installation

# Clone the repository
git clone https://github.com/your-repo/paperine.git
cd paperine

# Build the project
cargo build --release

# Install globally
cargo install --path cli

Basic Usage

# Search ArXiv papers with enhanced PDF extraction
paperline-cli arxiv search "machine learning" --limit 10

# Download mathematical papers with enhanced extraction
paperline-cli arxiv download 2210.07830 --extract-markdown

# Research with AI summarization
paperline-cli research query "mathematical optimization" --arxiv-limit 5

Configuration

Customize your experience with a simple TOML configuration file:

[search]
limit = 25
format = "pretty"

[ai]
model = "watsonx"
summarize = true

📚 Documentation

Command Reference

Complete guide to all available commands and options.

API Integration

Learn how to integrate with ArXiv and Watsonx APIs.

User Guide

Complete user guide with examples and best practices for research workflows.