ScaleText

Investor logo
Authors

ŘEHŮŘEK Radim POMIKÁLEK Jan

Year of publication 2017
Type Software
web Repositář projektu (neveřejný, přístup na vyžádání vůči podpisu NDA)
Description ScaleText version 1.0 is a production-grade software system for large-scale scalable semantic search. The core of this result is a vector search engine, realized as a stand-alone software package that implements document indexing and search using vectors for text representation. The vectors are created automatically from plain text using several methods for semantic analysis: LSI, LDA, TF-IDF, Doc2vec a Stanford gloVe. The documents go through several stages, from preprocessing, segmentation, vectorization to vector encoding and storage. Each step is realized by a dedicated component, with its output backed by a backend database engine for persistence. Release 1.0 includes a full re-implementation of the entire pipeline at scale, in Python 3.5, including a set of top-level scripts for document indexing and a container architecture for deployment into production environments.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.

By clicking “Accept Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. Cookie Settings

Necessary Only Accept Cookies