FilingRisk uses EDGAR metadata and machine learning to quantify reporting infrastructure strain. We bridge the gap between academic research and real-world compliance monitoring.
Our foundation is EDGAR-ReSTMT, the first public dataset of SEC financial restatements derived entirely from EDGAR metadata. No proprietary dependencies. Fully deterministic.
Financial content models miss the process signatures of filing behavior. We analyze acceptance lags, document fragmentation, and amendment frequencies as empirical proxies for reporting system quality.
Our pipeline combines form-type filtering, regex keyword matching, and NLP context disambiguation to achieve reliable, reproducible labels without black-box LLM classification.
We train logistic regression and random forest classifiers on millions of historical filings, using metadata to predict subsequent restatements and amendments with measurable AUC improvements.
Embed our predictive scores directly into your compliance or investment workflows. Identify high-entropy filings and quantify infrastructure risk in real-time.