Paperine CLI

A standalone command-line research assistant with enhanced PDF processing and AI-powered analysis

Access ArXiv papers with multi-layered PDF extraction, mathematical symbol preservation, and AI-powered summarization - all without requiring a backend server or external dependencies.

🔍 ArXiv Search & Analysis
📄 Enhanced PDF Processing
Mathematical Symbol Support
🤖 AI Summarization
$ paperine search "machine learning" ✓ Found 25 papers from ArXiv ✓ AI summarization complete ✓ Results saved to output.json

🚀 Key Features

Standalone Operation

No backend server required - everything runs locally on your machine for maximum privacy and control.

ArXiv Integration

Direct access to academic papers via ArXiv API with intelligent search and filtering capabilities.

AI-Powered Analysis

Generate intelligent summaries using Watsonx AI for comprehensive paper understanding.

Enhanced PDF Processing

Advanced multi-layered PDF extraction with mathematical symbol preservation and Unicode normalization.

Cross-Platform

Fast, safe, and reliable operation on macOS, Linux, and Windows with optimized performance.

Multiple Formats

Output results in pretty, JSON, or compact formats to suit your workflow needs.

📄 Enhanced PDF Processing

Multi-Layered Extraction

1

Primary Extraction (pdf-extract)

Fast, reliable extraction for standard PDF content

2

Secondary Extraction (lopdf)

Alternative method for complex or problematic PDFs

3

Mathematical Fallback

Specialized extraction with enhanced symbol recognition

Mathematical Symbol Support

Greek Letters

α β γ δ ε θ λ μ π σ φ ψ ω

Mathematical Operators

∑ ∫ ∂ ∇ ∞ ± ≤ ≥ ≠ ≈ ∈ ∉

Set Theory & Logic

∪ ∩ ∅ ℝ ℕ ℤ ℚ ℂ → ⇒ ∀ ∃

Comprehensive Unicode normalization ensures proper handling of mathematical notation in academic papers.

🏗️ Architecture

Core Components

  • CLI Interface: Clap-powered command-line parsing
  • External APIs: Direct ArXiv and Wikipedia integration
  • PDF Processing: Multi-layered extraction (pdf-extract, lopdf) with mathematical fallback
  • Document Processing: Local DOCX, TXT, MD parsing with Unicode normalization
  • AI Integration: Watsonx-powered summarization
  • Configuration: TOML-based settings management

Technical Stack

Async Runtime Web Framework Frontend UI Watsonx AI Local Storage

Built with modern technologies for performance, safety, and reliability.

🚀 Quick Start

Installation

# Clone the repository
git clone https://github.com/your-repo/paperine.git
cd paperine

# Build the project
cargo build --release

# Install globally
cargo install --path cli

Basic Usage

# Search ArXiv papers with enhanced PDF extraction
paperline-cli arxiv search "machine learning" --limit 10

# Download mathematical papers with enhanced extraction
paperline-cli arxiv download 2210.07830 --extract-markdown

# Research with AI summarization
paperline-cli research query "mathematical optimization" --arxiv-limit 5

Configuration

Customize your experience with a simple TOML configuration file:

[search]
limit = 25
format = "pretty"

[ai]
model = "watsonx"
summarize = true

📚 Documentation

Command Reference

Complete guide to all available commands and options.

API Integration

Learn how to integrate with ArXiv and Watsonx APIs.

User Guide

Complete user guide with examples and best practices for research workflows.