Skip to content

TellMeFirst/tmfcore

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TellMeFirst's core

This repository contains the core of TellMeFirst.

TellMeFirst is a tool for classifying and enriching textual documents via Linked Open Data. It uses Lucene indexes for its classification and enrichment system. To build such indexes use our fork of the DBpedia Spotlight project.

The core of TellMeFirst is the lowest level component that directly interacts with Lucene.

Use the API exported by this module as follows. See also how the API is used by tmfcore_build_cli and by tmfcore_build_war.

API Initialization

First you need to initialize the settings and the index using the following code:

TMFVariables variables = new TMFVariables("/path/to/config/file");
IndexesUtil.init();

API Usage

Once you have initialized TMF's core, as described above, you can invoke classify() to classify text.

//
// Here `text` is a String, `numTopics` is a integer and `language`
// is again a String (typically either "en" or "it").
//
Classifier classifier = new Classifier(language);
List<String[]> res = classifier.classify(text, numTopics);

The classify() function follows the traditional TMF policy by which large texts are divided in chunks classified separately, and the result is generated merging the classification of each chunk of text.

You can bypass this policy by using the classifyShortText() function that directly passes the text to Lucene. Note, however, that depending on the Lucene configuration and on the text length, this call may raise an exception if the resulting Lucene query is too large.

Classifier classifier = new Classifier(language);
List<String[]> res = classifier.classifyShortText(text, numTopics);