AI Safety & Governance News Tracker

Python
NLP
Machine Learning
Predictive Modeling
Web App
An automated dashboard for tracking and analyzing global discourse on AI safety, alignment, and policy, built with Python, Streamlit, and GitHub Actions.
Author

Christopher Hynes

Published

July 25, 2025

Live Application | View the source code (.py) | GitHub Repository

Project Overview

As artificial intelligence becomes increasingly integrated into society, understanding the global conversation around its risks and governance is critical. This project is an automated dashboard designed to track and analyze the discourse surrounding AI safety, alignment, ethics, and policy. It moves beyond generic AI news to capture the nuanced, high-signal conversations happening in tech, research, and policy circles.

The goal was to create a tool that could provide real-time insights into key questions: What are the dominant sentiment trends? Which specific safety and governance concepts are gaining traction? And what are the narratives being pushed by different news sources?


Technical Deep Dive

This project was built as an end-to-end data pipeline, from automated data collection to interactive visualization.

1. Automated Data Pipeline

The backbone of the project is a data pipeline orchestrated by GitHub Actions.

  • Scheduled Execution: A CRON job triggers the news_pipeline.py script every 24 hours.
  • Data Collection: The script calls the NewsAPI using a carefully constructed search query to fetch the latest articles. The query is designed to be highly specific, targeting key phrases like “AI safety”, “AI alignment”, “EU AI Act”, and associated concepts like existential risk, interpretability, and red-teaming.
  • Data Persistence: The fetched and analyzed data is appended to a central news_data.csv file within the repository. Duplicates are removed to maintain a clean, growing dataset over time.
  • Automated Commit: The GitHub Action commits the updated CSV file back to the repository, creating a version-controlled history of the dataset.

2. Natural Language Processing (NLP)

  • Sentiment Analysis: To gauge the tone of the discourse, I used NLTK’s VADER (Valence Aware Dictionary and sEntiment Reasoner). VADER was chosen over simpler models like TextBlob because it is specifically tuned for news and social media text, allowing for a more nuanced analysis of headlines and content snippets. Analysis is performed on the combined title and description for richer context.

  • Key Concept Extraction: To identify the most prominent topics, I used n-gram analysis. By extracting the most frequent two- and three-word phrases (bi-grams and tri-grams) from article titles, the dashboard can surface specific, recurring concepts like “generative ai” or “big tech” instead of just single, disconnected keywords.

3. Interactive Dashboard

The frontend is a Streamlit application designed for clarity and interactivity.

  • Data Visualization: Charts are generated with Plotly Express to show sentiment trends over time and distributions.
  • Dynamic Filtering: Users can filter the entire dashboard by time period (e.g., Last 7 Days, Last 30 Days), allowing for both short-term and long-term analysis.
  • Deep Dive Feature: The dashboard allows users to select a key concept (n-gram) and instantly see all the articles related to it, providing a seamless way to explore the data.

Challenges & Future Work

  • API Limitations: The current implementation relies on the NewsAPI’s free tier, which provides article snippets rather than full text. This limits the depth of the NLP analysis. A future enhancement would be to integrate a full-text scraping library (like newspaper3k or BeautifulSoup) to perform analysis on the complete article content for greater accuracy.

  • Advanced Topic Modeling: While n-grams are effective for headlines, a more advanced topic modeling technique like BERTopic could be implemented on full-text data to uncover more subtle, semantically-related themes.


Conclusion

This project demonstrates an end-to-end data science workflow, from automated data acquisition and NLP analysis to deployment of an interactive web application. By focusing on the strategic domain of AI safety and governance, it serves as a valuable tool for anyone looking to understand the key trends and narratives shaping the future of artificial intelligence.