FAQ

Frequently asked questions about ArxivCSExplorer.

What is ArxivCSExplorer?

ArxivCSExplorer is a fast, AI-powered search engine for arXiv papers. It provides semantic search, pre-generated AI summaries, citation tracking, bookmarks with collections, paper comparison, claim classification, achievements, and personalized recommendations — all without requiring login.

Where does the paper data come from?

Papers are fetched directly from the official arXiv API (export.arxiv.org). Additional metadata comes from Semantic Scholar (citations), CrossRef (journal info), OpenAlex (concepts, affiliations), and Papers With Code (repositories, benchmarks).

What AI models power the summaries?

Summaries are generated by Llama 3.1 8B Instruct via Cloudflare Workers AI for live inference, or locally via Ollama (Gemma 4 8B) for bulk ingestion. Embeddings use BGE-base-en-v1.5 (Workers AI) or nomic-embed-text (Ollama). A single structured JSON prompt generates TL;DR, contributions, methods, limitations, beginner/technical explanations, keywords, paper type, prerequisites, and follow-up questions.

Are the AI summaries accurate?

They are generally reliable for well-structured abstracts, but AI summaries can occasionally miss nuance or misrepresent results. Always read the original abstract — or the full paper — for anything you intend to cite or build on.

How does the hybrid search work?

Search combines SQLite FTS5 keyword search (title boosted 10:1:5 over abstract and authors) with Cloudflare Vectorize semantic search. Results are merged with 25% keyword / 75% semantic weighting, deduplicated, and cached in KV with 2h TTL.

What search filters are available?

You can filter by author (substring match), minimum citation count, arXiv category (cs.LG, cs.CL, etc.), and date range (day/week/month). All filters work together and with hybrid search. Each filter combination gets its own KV cache key.

How do bookmarks and collections work?

Bookmarks are stored client-side in localStorage with 90-day TTL. You can create named collections, assign bookmarks to collections, add notes, track reading status (unread/reading/done), and export as JSON or BibTeX. There's a soft cap of 100 bookmarks with automatic pruning of oldest entries. No login required.

Where do citation counts come from?

Citation counts are fetched from Semantic Scholar API via hourly cron. Historical citation data is stored in citation_snapshots table. Both citation count and influential citation count are tracked.

How are related papers computed?

When a paper is ingested, we query Vectorize for the top-8 semantically similar papers by embedding cosine similarity. These are pre-computed and stored in the related_papers table, so related papers load instantly without querying Vectorize during user requests.

Can I compare multiple papers side-by-side?

Yes — use the /compare route with up to 6 paper IDs (e.g., /compare?ids=2301.07041,2302.13971). The comparison view shows TL;DR, key contributions, methods, limitations, and technical summaries in a responsive grid layout.

What is claim classification?

Claim classification uses AI to analyze whether a scientific claim is supported, contradicted, or neutral based on a paper's content. It's powered by Llama 3.1 and helps verify research statements.

Is there an RSS feed?

Yes — subscribe to /rss.xml for trending papers from the last week with TL;DR summaries. The feed is cached for 1 hour and updates as new papers are ingested.

What are achievements?

Achievements are gamified badges (bronze/silver/gold tiers) for paper exploration milestones: first bookmark, reading 10 papers, visiting 5 topics, comparing papers, building reading streaks, etc. All stored client-side in localStorage without requiring login.

Is there a rate limit?

No login or API key is required. Standard per-IP rate limiting applies (100 requests/minute with a 2-minute lockout on excess) to protect the service. In practice this limit is never hit during normal browsing. The ingestion pipeline processes 1 paper per minute (113 papers/day max) to stay within free-tier AI budgets.

How do I look up a specific paper?

Paste the arXiv ID (e.g. 2312.00752) or the full arXiv URL into the search box. You'll be taken directly to the paper detail page with AI summary, related papers, and citation data.

Can AI assistants use this?

Yes — install the arxiv-cli tool for programmatic access (search, trending, topics, authors). The CLI is designed for AI assistants like Claude and ChatGPT. There are also /ai.txt and /llms.txt routes for agent discovery.

What is the tech stack?

Next.js 16 deployed as Cloudflare Worker (OpenNext), API on Cloudflare Workers, D1 (SQLite) database, Vectorize for embeddings, KV for caching, Workers AI for inference. Global edge deployment with ISR rendering (10min revalidation).

Can I access the source code?

Yes — the project is open source under BSL 1.1 (free for non-commercial use, converts to MIT on 2029-06-01). Find it on GitHub via the link in the footer.

Who built this?

ArxivCSExplorer was built by Teycir Ben Soltane as a fast, semantic-first research paper explorer with zero-login access to all features.

FAQ