The PredictLeads News Events Dataset captures structured signals from business-related news, helping you identify meaningful company activity across millions of organizations worldwide. We source content from over 29 carefully selected news, blog, and press release platforms, using advanced machine learning models to classify and structure each event into a standardized format. This allows platforms and teams to act on real-world business developments with speed and precision. What’s Inside Each News Event entry contains key metadata such as: - Formatted Signal (e.g., Company X launches a new product) Signal Category (from a set of 29 distinct categories) - Detected Company Information (domain, name, ticker) - News Article Details (title, URL, publication date, relevant sentence) - Event Metadata (effective date, location, funding type, job title, product name, etc.) - Structured Fields for: financing, hiring, recognition, expansion, leadership changes, and more These fields are available via API or flat file export, depending on your integration needs. Supported Categories The dataset supports 29 well-defined event types, grouped into business-friendly categories such as: Acquisition & Investment Leadership & Hiring Product & Innovation Growth & Expansion Recognition & Challenges Delivery & Access REST API access with robust filtering JSONL / CSV bulk delivery Optional webhook integration for real-time workflows https://docs.predictleads.com/v3/guide/news_events_dataset
31
texts
The Technology Detections Dataset reveals which technologies are used by millions of companies worldwide. PredictLeads tracks technologies across company websites, DNS records, job descriptions, and subpages - giving you a multidimensional view of a company’s tech stack. Whether it's identifying if a company uses a specific CRM, payment processor, or DevOps tool, this dataset helps uncover technology adoption trends and detect integration opportunities. What’s Inside Each detection includes: Technology Name and ID First Seen / Last Seen Timestamps Company Info (domain, name, ticker if available) Detection Source: Found in JavaScript/HTML tags Mentioned in Job Descriptions Found on specific subpages (e.g. login, support) Seen via DNS records Score – A reliability indicator Contextual Links to related job postings, subpages, or DNS entries Each record is linked to the Technologies Dataset for deeper insights (e.g. pricing, description, category). Detection Channels We detect technologies from: Public website code and metadata Job posts mentioning specific tools or platforms Structured subpage analysis (e.g. /support, /login) DNS layer records (e.g. MX, TXT, NS, etc.) This makes it possible to distinguish between surface-level installs and more strategic, integrated technology use. Access & Delivery API: Query by company, technology, or time range Webhooks: Get real-time tech detection updates Bulk Exports: Available in CSV or JSONL format Can be combined with the Technologies Dataset for richer enrichment and classification https://docs.predictleads.com/v3/guide/technology_detections_dataset
1.3K
others
2K
videos
5K
videos
Pre-collected OCR datasets include images of natural scenes, handwritten texts, bills and documents, and test papers. The AI training data spans 20 languages, various natural environments, and diverse photographic angles. Annotated Imagery Data FileMarket provides a robust Annotated Imagery Data set designed to meet the diverse needs of various computer vision and machine learning tasks. This dataset is part of our extensive offerings, which also include Textual Data, Object Detection Data, Large Language Model (LLM) Data, and Deep Learning (DL) Data. Each category is meticulously crafted to ensure high-quality and comprehensive datasets that empower AI development. Specifications: Data Size: 50,000 images Collection Environment: The images cover a wide array of real-world scenarios, including shop signs, stop boards, posters, tickets, road signs, comics, cover pictures, prompts/reminders, warnings, packaging instructions, menus, building signs, and more. Diversity: The dataset spans 5 languages and includes images from various natural scenes captured at multiple photographic angles (looking up, looking down, eye-level). Devices Used: Images are captured using cellphones and cameras, reflecting real-world usage. Image Parameters: All images are provided in .jpg format, and the corresponding annotation files are in .json format. Annotation Details: The dataset includes line-level quadrilateral bounding box annotations and text transcriptions. Accuracy: The error margin for each vertex of the quadrilateral bounding box is within 5 pixels, ensuring bounding box accuracy of at least 97%. The text transcription accuracy also meets or exceeds 97%. Unique Data Collection Method: FileMarket utilizes a community-driven approach to collect data, leveraging our extensive network of over 700k users across various Telegram apps. This method ensures that our datasets are diverse, real-world applicable, and ethically sourced, with full participant consent. This approach allows us to provide datasets that are both comprehensive and reflective of real-world scenarios, ensuring that your AI models are trained on the most relevant and diverse data available. By integrating our unique data collection method with the specialized categories we offer, FileMarket is committed to providing high-quality data solutions that support and enhance your AI and machine learning projects.
50K
images