Hey r/datasets community!
Iβm thrilled to share an exciting new resource for all you data enthusiasts, researchers, and finance aficionados out there. https://github.com/sovai-research/open-investment-datasets
π Whatβs New?
Sov.ai has just launched the Open Investment Data Initiative! Weβre building the industryβs first open-source investment datasets tailored for rigorous research and innovative projects. Whether you’re into AI, ML, quantitative finance, or just love diving deep into financial data, this is for you.
π Free Access with a 6-Month Lag
All our 20 datasets will be available for free with a 6-month lag for non-commercial research purposes. This means you can access high-quality, ticker-linked data without breaking the bank. For commercial use, we offer a subscription plan that makes premium data affordable (more on that below).
π What We Offer
By the end of 2026, Sov.ai aims to provide 100+ investment datasets, including but not limited to:
π° News Sentiment: Ticker-matched and theme-matched sentiment analysis from various news sources. π Price Breakout Predictions: Daily updates predicting upward price movements for US equities. π Insider Flow Prediction: Over 60 insider trading features ideal for machine learning models. πΌ Institutional Trading: In-depth analysis of institutional investment behaviors and strategies. π’ Lobbying Data: Detailed data on corporate lobbying activities, linked to specific tickers. π Pharma Clinical Trials: Unique dataset tagging clinical trials with predicted success outcomes. β οΈ Corporate Risks: Bankruptcy predictions (Chapter 7 & 11) for over 13,000 US publicly traded stocks. …and many more!
π€ Get Involved!
Weβre looking for firms and individuals to join us as co-architects or sponsors on this journey. Your support can help us expand our offerings and maintain the quality of our data. Interested? Reach out to us here or connect via our LinkedIn, GitHub, and Hugging Face profiles.
π§ͺ Example Use Cases
Hereβs how easy it is to get started with our datasets using the Hugging Face datasets library:
from datasets import load_dataset
# Example: Load News Sentiment Dataset
df_news_sentiment = load_dataset(“sovai/news_sentiment”, split=”train”).to_pandas()
# Example: Load Price Breakout Dataset
df_price_breakout = load_dataset(“sovai/price_breakout”, split=”train”).to_pandas()
# Add more datasets as needed…
submitted by /u/OppositeMidnight
[link] [comments]