Focus Group Meeting on Historical Datasets

CFC_full_logo fs-ktn-logo-150x150 fs-ktn-logo-150x150
Advancing Collaboration Between Industry & Academia and Exploring Innovation Challenges

Thought Leadership Focus Group on Historical Datasets

Workshop Venue: Holborn Bars, 138-142 Holborn, London EC1N 2NQ

10 March 2010, 5:30pm - 7:30pm

Following the Thought Leadership in Trading technology workshop on the 20th January 2010, this will be the first meeting of the Historical Datasets focus group. Our aim is to produce a manifesto that will be presented at the Trade Tech conference in April.

Theme for this Focus Group

Key Observations:

Lack of access for scientists to data (often leads to using unrealistic or simulated data for verification of research.)

Data mining for market patterns and cluster analysis requires further research and development.

The industry uses clean data. However, accurate determination of best execution and accurate historical simulation both require the data that was actually available at the time – which is often “dirty”. Data that has been cross-referenced and corrected for missing and incorrect values is “clean” data. Yet both “dirty” and “clean” data need to be extensively marshalled and often normalized in order to render it suitable for analysis, simulation and data-mining. Cleaning and marshalling of data is a substantial and therefore expensive task.

A data cleansing service is offered by several vendors (including for example the Wombat – RMDS service). However, data marshalling is more difficult to provide as a service because required formats are task- dependent.

Developing a macro behavioural model for market impact is crucial as a means to inject strategy into simulation. In order to achieve this data regarding order books (not just prices) is required.

Key Questions:

What are the exact/key data sets required? For what purpose? With what structure?

Can we create a taxonomy of required datatsets for different purposes, including both industry and academic uses?

What do we want from a provider of benchmark datasets, and what are the obstacles to achieving this service? How could these obstacles be overcome and can we define a set of "next steps"?


We are grateful to our sponsors who have covered the costs of the room and refreshments.  We will ask two participants (selected at random) to take notes during the meeting, to avoid the need to employ a technical secretary.  As a result, this meeting is free of charge.

Please email Lucinda Kingswood ( This e-mail address is being protected from spambots. You need JavaScript enabled to view it ) if you are interested in this focus group meeting.