- Python 97.2%
- Shell 2.7%
| eval | ||
| examples | ||
| ops | ||
| research | ||
| src/news_curator | ||
| tests | ||
| .gitignore | ||
| .python-version | ||
| LICENSE | ||
| poetry.lock | ||
| pyproject.toml | ||
| README.md | ||
News curator
This software extracts new information from daily news, and produces compact daily digests that report only novelties, excluding background or evergreen information. An individual well versed in a specific domain can then quickly access dense information that matters from selected article feeds.
This software is distributed under the GNU GPL v3.
Why this project
This repository is built as an end-to-end AI engineering project around a practical task (daily news monitoring):
- ingestion of articles from RSS feeds,
- content extraction from raw HTML,
- LLM-based takeaway generation,
- embedding-based deduplication with pgvector,
- daily report generation,
- offline evaluation scripts for model comparison.
Engineering choices
Here is the implemented workflow, with comments on the engineering choices:
- The user provides a list of article feeds.
- The feeds may cover a specific topic, such as AI, or multiple topics.
- The articles are downloaded and stored in PostgreSQL.
- PostgreSQL was chosen as a general-purpose database that can also serve as a vector database (see below).
- The download is a script designed to be run as an idempotent scheduled job.
- A takeaway extracts the core new information conveyed by each downloaded article, and is stored in PostgreSQL with pgvector extension.
- A takeaway is generated by an LLM from the article's title and content.
- The embedding model BAAI/bge-large-en-v1.5 is used for strong retrieval performance.
- The vector database is PostgreSQL with extension pgvector, so as to rely on a consolidated database for articles and takeaways.
- The daily review is generated from the takeaway store, with deduplication.
- The takeaways generated for today (or another target date) are compiled in a Markdown review, with links to the original articles for more information.
- Today's takeaways are deduplicated by selecting one takeaway (the longest) among all similar takeaways covering the same piece of news.
- Takeaways that are similar to past takeaways (from previous days) are filtered out, since they have already been included in a past review.
- Similar takeaways are detected using (1) a similarity threshold (cosine similarity) between the embeddings of the takeaway store, (2) a second and finer filtering with a cross-encoder.
Limitations and future work:
- The similarity thresholds used to select duplicates, and potential duplicates before cross-encoder evaluation, were derived from manual investigation. It should be derived from a larger set of selected articles covering the same news, and implemented in a program included in the repository.
- Integration tests run on SQLite (faster and easier to run temporarily), while production vector similarity uses PostgreSQL + pgvector.
Architecture
The core module follows a Domain-Driven Design (DDD):
src/news_curator/domain/: domain entities (Article,Takeaway,Review),src/news_curator/application/: use cases and abstract interfaces,src/news_curator/infrastructure/: fetchers, persistence, LLM adapters,src/news_curator/presentation/: CLI entry points.
Additional directories:
eval/: evaluation scripts for takeaway quality,ops/: operational scripts (PostgreSQL, LLM start, pipeline helpers),tests/: unit and integration tests,files/reviews/: generated daily reviews.
Tech stack and tools:
- Fetching: Playwright
- HTML parsing: Trafilatura
- Storage: PostgreSQL (article store) with pgvector (takeaway store) and SQLAlchemy
- LLM: OpenAI-compatible API (OpenAI, vLLM)
- Bi-encoder: embeddings of vector DB
- Cross-encoder: semantic textual similarity for evaluation and review deduplication
- Packaging: Docker, Poetry
- Quality: Pytest, Black, mypy
Installation
1) Prerequisites
- Python
3.13.x - Poetry (Ubuntu package:
python3-poetry) - Docker + Docker Compose v2 (Ubuntu packages:
docker.ioanddocker-compose-v2) - (Optional) pyenv
2) Install dependencies
# Optional, if using pyenv:
pyenv install 3.13.11
pyenv local 3.13.11
poetry env use 3.13.11
poetry install
# Install Playwright browsers.
poetry run playwright install
# Ubuntu 24.04, if Playwright requires extra system libraries:
sudo apt-get install libgstreamer-plugins-bad1.0-0 libavif16
Quick demo
This workflow is isolated from the development workflow (database, LLM inference engine). It is provided to demonstrate an end-to-end run of the review generation pipeline.
1) Start and initialize the demo PostgreSQL
poetry run ops/scripts/demo/pg_start
poetry run ops/scripts/demo/pg_init
2) Start a local LLM server (vLLM)
ENV_FILE=examples/.env.demo poetry run ops/scripts/llm_start
3) Run the demo pipeline
poetry run ops/scripts/demo/run_pipeline
The demo pipeline uses examples/demo_ai.yaml and writes the review to examples/demo_review_2026-03-06.md.
The 2026-03-06 review should look like:
1. Microsoft and Google confirm Anthropic Claude remains available to non-defense customers despite Defense Department's supply-chain risk designation. [ https://techcrunch.com/2026/03/06/microsoft-anthropic-claude-remains-available-to-customers-except-the-defense-depar>
2. Anthropic's Claude identified 22 vulnerabilities in Firefox, 14 of which are high-severity, in just two weeks, significantly contributing to Firefox's security updates. [ https://techcrunch.com/2026/03/06/anthropics-claude-found-22-vulnerabilities-in-firefox-over-two>
3. City Detect, using AI to monitor building health, raises $13M Series A to automate building inspections, enabling cities to track and address issues faster than human crews. [ https://techcrunch.com/2026/03/06/city-detect-uses-ai-to-help-cities-stay-safe-and-clean/ ]
The configuration file examples/demo_ai.yaml includes 6 articles, but 2 articles are from the previous day (2026-03-05), and one of the articles from 2026-03-06 (https://techcrunch.com/2026/03/06/after-europe-whatsapp-will-let-rival-ai-companies-offer-chatbots-in-brazil/) reports news already covered by an article from 2026-03-05 (https://www.globalbankingandfinance.com/meta-allow-ai-rivals-whatsapp-bid-stave-off-eu-action/). It is therefore detected as a duplicate, already covered in the 2026-03-05 review, and excluded from the review.
4) Stop or remove demo PostgreSQL resources
poetry run ops/scripts/demo/pg_stop
poetry run ops/scripts/demo/pg_remove
Development workflow
1) Configure environment
cp examples/.env .env
Then edit .env if needed. The default values are sufficient to run the local pipeline. The evaluation part (LLM-as-a-judge) requires an OpenAI key or a local OpenAI-compatible API.
2) Start and initialize PostgreSQL
poetry run ops/scripts/pg_start
poetry run ops/scripts/pg_init
poetry run ops/scripts/pg_init_test
ops/scripts/pg_init_test is creates an additional database named news_curator_test, with pgvector enabled. It is required to run certain PostgreSQL integration tests.
To stop PostgreSQL later:
poetry run ops/scripts/pg_stop
To fully clean PostgreSQL resources (container, network, image, volume):
poetry run ops/scripts/pg_remove
3) Start a local LLM server (vLLM)
poetry run ops/scripts/llm_start
Run the pipeline
1) Download articles
Save a YAML file listing RSS feeds, e.g., ai.yaml:
feeds:
- name: TechCrunch
url: https://techcrunch.com/category/artificial-intelligence/feed/
- name: MIT
url: https://news.mit.edu/topic/mitartificial-intelligence2-rss.xml
To download and store the articles from these sources, run:
poetry run python src/news_curator/presentation/run_download_articles.py -c ai.yaml
2) Extract takeaways
From the articles now in the article store, generate the takeaways for each article and store them in the takeaway store with:
poetry run python src/news_curator/presentation/run_extract_takeaways.py --llm-chat-completions
3) Generate daily review
The review for a certain date is generated with:
poetry run python src/news_curator/presentation/run_generate_review.py --date 2026-03-04
Write today's review to a Markdown file:
mkdir -p files/reviews
DATE=$(date -I)
poetry run python src/news_curator/presentation/run_generate_review.py --date "$DATE" > "files/reviews/review-${DATE}.md"
4) Run all steps with helper script
This generates yesterday's and today's reviews:
poetry run ops/scripts/run_pipeline
Evaluation workflow
The eval/takeaways/ scripts A) compare takeaway quality across models and B) support LLM-as-a-judge scoring. They are used to compare various LLMs and select the best for the task.
For both evaluations A and B, a number of articles (e.g., 200) are first extracted from the article store.
Evaluation A
- Generate a takeaway per article, with a high-quality reference model and with the production model.
- Compare, using a cross-encoder, the reference takeaways with the production takeaways, and assign semantic similarity scores.
- Use the scores to compare multiple production models, and select the one closest to the reference LLM.
Example
This is an example for evaluation A, with similarity scores between production takeaways and reference takeaways (higher is better). This example is available in file examples/eval/takeaways/compare_takeaways.md and generated by eval.takeaways.compare_takeaways.
Reference model: gpt-5
Scoring model: cross-encoder/stsb-roberta-large
| Production model | Articles | Min | P25 | Median | Mean | P75 | Max | Stddev |
|---|---|---|---|---|---|---|---|---|
| unsloth/gemma-3-12b-it-GGUF | 100 | 0.3102 | 0.6412 | 0.6885 | 0.6698 | 0.7283 | 0.8031 | 0.0987 |
| Qwen/Qwen2.5-3B-Instruct | 100 | 0.3404 | 0.5990 | 0.6550 | 0.6464 | 0.7025 | 0.8433 | 0.0900 |
| Qwen/Qwen2.5-0.5B-Instruct | 100 | 0.2631 | 0.5644 | 0.6202 | 0.6074 | 0.6784 | 0.7887 | 0.1057 |
Evaluation B
- Provide a strong LLM-as-a-judge with the articles and corresponding takeaways.
- Let the LLM-as-a-judge return a numerical score and a written explanation of its evaluation.
Example
This is an example for evaluation B, with mean score and score distribution. This example is available in file examples/eval/takeaways/llm_as_judge.md and generated by eval.takeaways.report_llm_as_judge.
| Production model | Judge model | Articles | Mean | Score 1 (%) | Score 2 (%) | Score 3 (%) | Score 4 (%) |
|---|---|---|---|---|---|---|---|
| unsloth/gemma-3-12b-it-GGUF | gpt-5 | 100 | 3.0400 | 5.0% | 14.0% | 53.0% | 28.0% |
| Qwen/Qwen2.5-3B-Instruct | gpt-5 | 100 | 2.3900 | 19.0% | 26.0% | 52.0% | 3.0% |
| Qwen/Qwen2.5-0.5B-Instruct | gpt-5 | 100 | 2.0400 | 28.0% | 43.0% | 26.0% | 3.0% |
1) Build an evaluation article dataset
mkdir -p data/eval/takeaways
poetry run python -m eval.takeaways.extract_dataset_articles --num-articles 200 --output data/eval/takeaways/articles.parquet
2) Generate takeaways with a reference model
This step is needed for evaluation A only.
poetry run python -m eval.takeaways.generate_dataset_takeaways \
--input data/eval/takeaways/articles.parquet \
--output data/eval/takeaways/ref_takeaways.parquet \
--llm-id gpt-5-nano \
--llm-base-url https://api.openai.com/v1/ \
--llm-api-key "$OPENAI_API_KEY"
3) Generate takeaways with a production model
This step is needed for evaluation A and B. It can be run with different candidate production models to compare them and select the best.
poetry run python -m eval.takeaways.generate_dataset_takeaways \
--input data/eval/takeaways/articles.parquet \
--output data/eval/takeaways/prod_takeaways.parquet \
--llm-id "$PROD_LLM_ID" \
--llm-base-url "$PROD_LLM_BASE_URL"
4) Compare reference vs production takeaways (semantic similarity)
This is evaluation A.
poetry run python -m eval.takeaways.compare_takeaways \
--ref data/eval/takeaways/ref_takeaways.parquet \
--prod data/eval/takeaways/prod_takeaways_model_a.parquet \
--prod data/eval/takeaways/prod_takeaways_model_b.parquet \
--markdown-output data/eval/takeaways/compare_takeaways.md
5) Score takeaways with an LLM judge
This is evaluation B.
poetry run python -m eval.takeaways.llm_as_judge \
--input data/eval/takeaways/prod_takeaways.parquet \
--output data/eval/takeaways/prod_takeaways_judged.parquet \
--llm-id gpt-5-nano \
--llm-base-url https://api.openai.com/v1/ \
--llm-api-key "$OPENAI_API_KEY"
6) Compare multiple LLM-as-a-judge evaluations
This step only reads Parquet files with scores from step 5) and computes summary statistics and score distributions for evaluation B.
poetry run python -m eval.takeaways.report_llm_as_judge \
--input data/eval/takeaways/prod_takeaways_model_a_judged.parquet \
--input data/eval/takeaways/prod_takeaways_model_b_judged.parquet \
--markdown-output data/eval/takeaways/report_llm_as_judge.md
Quality checks
Run formatting and tests:
poetry run black --check .
poetry run mypy
poetry run pytest
License
GNU GPL v3.