
OpenAI 直播活動
OpenAI 將舉辦一場直播活動。在直播期間將揭露具體的公告、產品發布或示範內容。
上一次 OpenAI 突然搞直播,他們直接丟出 GPT-4 Turbo,然後一夜之間改掉所有定價
![SGOCR:空間定位的光學字元辨識導向管線與第一版資料集 [P]](/fallback/opinions-parchment-2.jpg)
Hello everyone! I've been independently researching & developing small-but-powerful vision-language models (VLMs) and noticed a gap in visual datasets - none were teaching my model to simply ground text in imagery, but trying to get it to reason about the text or about the scene itself. This lead me down a 2 week side-side-project to create SGOCR, an open source dataset pipeline for generating spatially-grounded, OCR-focused VQA tuples with tons of rich metadata to support diverse VLM training