Texthero was developed by Jonathan Besomi as a member of the TIS lab as an efficient way to work with text-based datasets. It is simple to learn and integrates with Pandas. Texthero is free, open-source and well documented.

Texthero include tools for:

… Preprocessing text data

… NLP for keyphrases and  keyword extraction

… NLP for named entity recognition

… Text representation: TF, TF-IDF, and custom embeddings

… Vector space analysis & topic modeling

… Document clustering (K-means, Mean-shift, DBSCAN, Hierarchical)

… Text visualization and vector space visualization

Download Texthero:


An interactive method for matching business names between lists, with functions for corporate name cleaning, fuzzy matching, and false-positive/false-negative inspection.


Available on request


An environment for grid-world games in experimental economics. Supports games for both human subjects and reinforcement learning (Q-Learning) simulations.


Available on request


A Pythonic front-end for the USPTO Patents View datasets, with supplemental data from “Patent-to-Patent Similarity: A Vector Space Model”.


Available on request


A simple module to secure confidential information (such as logins and passwords) directly within regular Python text, without exposing the information on deployment.


Copyright 2017-2020. All software is provided for non-commercial use, subject to a Creative Commons Attribution-NonCommercial-NoDerivatives license. No co‐authorship is required to use the software in academic research – please just cite author and source.