A wide range of available datasets from global exchanges and sources. Reference historical data as a single request or scheduled review of the chosen dataset. Minimum of 5 years of data and updated daily. Comprehensive global data and China specialists
Global data
A minimum of 5 years of historical data updated daily with the ability to extract specific sections, e.g. MD&A. Public news and press releases collected from exchanges including:
Country | Exchanges | Languages |
---|---|---|
U.S. | EDGAR filings that covers NYSE, Nasdaq and OTC markets | English |
Canada | Toronto stock exchange, TSX Venture exchange | English |
U.K. | London Stock Exchange | English |
China | Shanghai, Shenzhen, Hong Kong and Taiwan exchanges | Chinese simplified, Chinese traditional, English |
Japan | Tokyo Stock Exchange | Japanese |
South Korea | Korea Exchange | Korean |
Australia | Australian Securities Exchange | English |
Singapore | Singapore Exchange | English |
Germany | Frankfurt Stock Exchange | German |
France | Neronext Paris | French |
Israel | Tel Aviv Stock Exchange | Hebrew |
Netherlands | Amsterdam Exchange | English |
Finland | Nasdaq Helsinki | English |
Denmark | Nasdaq Copenhagen | English |
Iceland | Nasdaq Iceland | English |
All company filings for list companies have been captured across the exchanges we cover. Filings include reports such as:
Annual reports
Investor Notification
Mergers and Acquisitions
Offering of Securities
Quarterly Reports
IPO Filings
Buyback
Tender offer
Financial statements
Meetings
Proxy filings
Annual reports
Investor notifications
Insider Trading
We have a wide range of predefined functional/NLP/machine learning algorithms (English/Chinese) which enable insight from the selected datasets. Our algorithms include but not exclusive to:
- Functional: Sentiment, Event detection, Theme detection, Summarisation, Article/Phrase Clustering, Keyword Extraction
- NLP Pre-processing: NER, Part of Speech Analysis
- Machine Learning: BERT – (Bidirectional Encoder Representations from Transformers), Deep Learning, Integrated Training
China data
China data coverage
- Order flow data
- Service Earning call transcripts
- Public disclosures
- Executive official response
- Issuers
- Instrument
- Company relationships
- Equity top 10 shareholders
- Fund Holdings
Social media coverage
We have extensive social media coverage including Baidu, WeChat, TikTok and Weibo.
Comprehensive and high-quality data
China public filings – We have full public filings coverage of Shanghai, Shenzhen and Hong Kong exchanges since 2016 which is updated daily. We retain all original PDF files with text versions also available.
China A share earning call transcripts – We have full coverage of Shanghai and Shenzhen stocks since 2013. We have all original documents, in structured JSON format, and translated into English also. See further down this page for details in our ‘Focus on’ feature.
China golden data – We have full coverage of Shanghai, Shenzhen and Hong Kong exchanges with key metadata for all listed companies, related entities, people and social media accounts.
We maintain search index APIs for Baidu, WeChat, TikTok and Weibo.
We provide comprehensive official/regulatory documents which link and impact securities both directly and indirectly.
Information extraction – we have a number of detailed NLP algorithms and sophisticated calculation platforms which accurately extract key required information from documents in a variety of ways.
The language barrier – We have developed an optimised Chinese – English translation engine specifically tailored for the financial industry to increase accuracy against non-industry specific generic platforms.
Focus on: China A Transcripts
Data types
When delivering files, each report (in JSON format), together with its metadata file (in CSV format) is included within a zip folder. English and Chinese versions of the same report are separated in two individual zip folders. Data files are put in folder directory by YYYY/MM.
Each transcript is in JSON format and takes on the following filename convension: DATE_TIME_TRANSCRIPT-UNIQUE-ID_LANGUAGE.json
With the metadata file being metadata.csv
Reports are then classified into one of the three types below:
ECM: Earning Call Transcripts – the transcripts of earning call meetings after annual or quarterly reports being released.
BRD: Public disclosures on broker onsite research – by regulation, if any broker/investment bank carry out on-site research to listed companies, they are under obligation to disclose such information publicly. Since it’s an ad-hoc activity, not every company has such disclosures. Statistically around a 1/3 of the whole universe has these disclosures.
OQA: Executive official responses on online platforms – there’s a small number of online forums that allow investors to directly ask directors of listed companies questions, and executives may reply selectively. Such communications can happen in real-time, which provides the most up to date insights of the listed companies.
Json file contains below keys:
stockcode – stock ticker for local exchange
exchangecode – MIC code of Shanghai stock exchange (XSHG) and Shenzhen stock exchange (XSHE)
typesOfInvestorRelationsActivities: ecm and brd (ecm is for the earning call transcript and brd is for broker research disclosure)
transcriptuniqueid – our internal unique ID
transcripttitle – transcript title
nameOfParticipatingUnitAndPersonnel – name of investor participants
time: {
start_time – meeting start date and time
end_time – meeting end date and time, optional
published – date and time when the report is available on exchange website
}
Location – meeting location
listedCompanyReceptionistName – name of company participants
content:{
“contented – internal unique ID
“contenttype – statement is for the opening statement from company, question and answer are for the Q&A sessions
“content – detailed transcript
}
versioned – version of the transcript, all defaulted to be 1 initially
uploadtime – date and time when transcript file is uploaded to Orbit S3 bucket
stockcode – stock ticker for local exchange
exchangecode – MIC code of Shanghai stock exchange (XSHG) and Shenzhen stock exchange (XSHE)
typesOfInvestorRelationsActivities – default to OQA
transcriptuniqueid – our internal unique ID
date – date of the online Q&As by answer date
versioned– version of the transcript, all defaulted to be 1 initially
content: [
{
question: {
question_id – internal unique ID
platform – name of the source platform
speak_user_name – person name
speak_time – date and time the person ask the question
content_cn/en – statement
reply: [
{
reply_id- internal unique ID
speak_time – date and time the person answered the question
content_cn/en – content
speak_user_name – person name
}
]
Stockcode
exchangecode
uploadtime
transcript_start_date_time
transcript_end_date_time
transcriptuniqueid
file_name
title_en/title_cn
version_id
typesOfInvestorRelationsActivities
Take a look at the way we leverage unstructured data through additional products and services
Bespoke data extraction from unstructured documents. Example use cases for ESG and compliance.
Example users: Data vendors, exchanges, start ups.
Enterprise search engine for research and personalised investment insights and opportunities.
Example users: Portfolio managers, research analysts, risk & compliance.