M4S: Mining Big Text Data for Semantics

The Mining Big Text Data for Semantics (M4S) workshop, to be held on October 17, 2016 at Kobe Japan during the 15th International Semantic Web Conference, aims to explore the potential combinations of statistical and formal semantic based approaches that will help to combine the analytic depth and precision of the latter with the scalability, recall and speed of the former.

M4S focuses on two application domains, namely healthcare and finance. For both, we see coexistence of large amount of textual documents, which are still the predominant means of communication, and extensive models in formal knowledge representation languages. Taking healthcare as an example, textual documents are still the means of communication when scholars, industrial practitioners, and authorities publish their research findings, clinical trial reports, recommendations, GxP protocols and guidelines. However, gigantic ontologies are also widely available as the outcomes of community-wide collaborations. In the finance domain, new pieces of data are being produced at second or even millisecond magnitude. Unambiguously defining the data nuances and bringing them under regulatory powers of authorities becomes essential.

The workshop intends to foster discussions and seek answers to the following research and development questions:

Call for Participation

Objectives

There has been a growing interest in recent years in probabilistic and statistical methods for mining and analysing textual data, which is fuelled by the explosive increase of computer power and highly efficient algorithms. By quantifying statistical co-occurrence of words across extremely large corpora, such methods can identify various patterns in natural language and thus allows for accurate predictions in many NLP tasks.

At the same time, large-scale curated knowledge models/ontologies have been developed jointly by international collaborations and successfully applied to semantics-based content processing. Thus, there is increasing interests in formalising semantics with statistical methods.

The Mining Big Text Data for Semantics (M4S) workshop aims to explore the potential combinations of statistical and knowledge-based approaches that will help to combine the analytic depth and precision of the latter with the scalability, recall and speed of the former.

M4S workshop will be firmly grounded upon two application domains, namely healthcare and finance. For both, we see coexistence of large amount of textual documents which are still the predominant means of communication and extensive models in formal knowledge representation languages.

The workshop consists of four sessions: one keynote speech, paper presentations (3-5 papers), panel discussion, and a demo & networking session.

Motivation

Semantics of natural language, vaguely defined as it is, has been assigned a variety of canonical forms in the past two decades. The formal description logic based mathematical theory may have encountered challenges for applications eager to jump on the Big Data bandwagon wherein “precision” gives way to “speed” and “scale”. Such a trend is particularly evident when one tries to make sense out of large uncurated text corpora. The sheer size of such text data and their informality render approaches based on formal semantics inefficient. Two typical application domains are healthcare and finance.

In both domains, there is a strong call for gleaning the best from both worlds to tackle at the same time “speed” and “ambiguity”. Taking healthcare as an example, textual documents are still the means of communication when scholars, industrial practitioners, and authorities publish their research findings, clinical trial reports, recommendations, GxP protocols and guidelines. However, gigantic ontologies are also widely available as the outcomes of community-wide collaborations. Existing efforts of combining these two worlds still lack scalability because they usually involve rewriting free text search queries with ontology concepts or populating ontologies with extracted instance entities using predefined linguistic patterns.

In the finance domain, new pieces of data are being produced at second or even millisecond magnitude. The subtle fluctuation of data can impinge a much larger scale in the global financial market. Clearly defining the data nuances and bringing them under regulatory powers of authorities becomes essential. For instance, the Data Transparency Act mandates every listed company to publish their data in machine understandable format. The community, however, failed to provide satisfactory tools to translate between textual business data and formal reporting models, which leads to great resistance from the businesses and draw-back of the Act implementation.

Recently, among other statistical/probabilistic tools, bag-of-words based text mining has demonstrated its advantages in processing text corpora, ignoring syntax information and computationally expensive tasks such as part of speech, and sentence structure, instead focusing on simple proximity. This makes it possible to process very large text corpora. The resultant data-driven distributional “semantics” seems to outperform many conventional approaches in detecting “analogy” among words.

The M4S workshop sees a right opportunity to bring these two communities even closer to interrogate the interplay between formal and distributional semantics in the context of Big Text Data.

Topics

Topics of interest include but are not limited to:

Important Dates

Submission 15 Jul 2016 (extended)
Notification 15 Aug 2016
Camera ready 20 Aug 2016

Submission Guideline

M4S invites three types of submissions:

Submitted papers will be peer-reviewed by at least two workshop Programme Committee members. Accepted papers will be presented at the workshop.

All papers should be written in English following the Springer conference proceedings guidelines (LNCS guidelines). Technical papers should not exceed 14 pages including bibliography and figures. Short position papers should be no more than 6 pages clearly state “position paper” in the title. All system demo submissions should be accompanied by a two-page description of key features and core technologies of the system. Preferably, a link to the real demo should be made available at the time of submission.

Papers will be submitted in PDF format through EasyChair. If you experience any problems during the submission, please contact the workshop co-Chairs at m4s@easychair.org.

Workshop Organizers

Co-Chairs
Program Committee

Contact

If you have any questions, please contact one of the workshop organiser via email at