Grasp.py - Explainable AI
Grasp is a lightweight AI toolkit for Python, with tools for data mining, natural language processing (NLP), machine learning (ML) and network analysis. It has 300+ fast and essential algorithms, with ~25 lines of code per function, self-explanatory function names, no dependencies, bundled into one well-documented file: grasp.py (250KB). Or install with pip, including language models (50MB):
$ pip install git+https://github.com/textgain/grasp
Tools for Data Mining
Download stuff with download(url) (or dl), with built-in caching and logging:
Parse HTML with dom(html) into an Element tree and search it with CSS Selectors:
print(e.href)
Strip HTML with plain(Element) to get a plain text string:
print(word, count)
Find articles with wikipedia(str), in HTML:
print(plain(e))
Find opinions with bluesky(str):
print(post.id, post.text, post.date)
Deploy APIs with App. Works with WSGI and Nginx:
def index(*path, **query):
return 'Hi! %s %s' % (path, query)
Once this app is up, go check http://127.0.0.1:8080/app?q=cat.
Tools for Natural Language Processing
Get language with lang(str) for 40+ languages and ~92.5% accuracy:
Get locations with loc(str) for 25K+ EU cities:
Get words & sentences with tok(str) (tokenize) at ~125K words/sec:
Get word polarity with pov(str) (point-of-view). Is it a positive or negative opinion?
print(pov(tok('Dumb.', language='en'))) # -0.4
- For de, en, es, fr, nl, with ~75% accuracy.
- You'll need the language models in grasp/lm.
Tag word types with tag(str) in 10+ languages using robust ML models from UD:
print(word, pos)
- Parts-of-speech include
NOUN,VERB,ADJ,ADV,DET,PRON,PREP, ... - For ar, da, de, en, es, fr, it, nl, no, pl, pt, ru, sv, tr, with ~95% accuracy.
- You'll need the language models in grasp/lm.
Tag keywords with trie, a compiled dict that scans ~250K words/sec:
print(i, j, k, v)
Get answers with gpt(). You'll need an OpenAI API key.
Tools for Machine Learning
Machine Learning (ML) algorithms learn by example. If you show them 10K spam and 10K real emails (i.e., train a model), they can predict whether other emails are also spam or not.
Each training example is a {feature: weight} dict with a label. For text, the features could be words, the weights could be word count, and the label might be real or spam.
Quantify text with vec(str) (vectorize) into a {feature: weight} dict:
v2 = vec('I hate cats! ', features=('c3', 'w1'))
c1,c2,c3count consecutive characters. Forc2, cats - 1x ca, 1x at, 1x ts.w1,w2,w3count consecutive words.
Train models with fit(examples), save as JSON, predict labels:
Once trained, Model.predict(vector) returns a dict with label probabilities (0.0-1.0).
Tools for Network Analysis
Map networks with Graph, a {node1: {node2: weight}} dict subclass:
g.add('b', 'c') # b - c
g.add('b', 'd') # b - d
g.add('c', 'd') # c - d
See networks with viz(graph):
f.write(viz(g, src='graph.js'))
You'll need to set src to the grasp/graph.js lib.
Tools for Comfort
Easy date handling with date(v), where v is an int, a str, or another date:
Easy path handling with cd(...), which always points to the script's folder:
Easy CSV handling with csv([path]), a list of lists of values:
print(code, country)
data.append(('cat', 'Kitty'))
data.append(('cat', 'Simba'))
data.save(cd('cats.csv'))
Tools for Good
A challenge in AI is bias introduced by human trainers. Remember the Model trained earlier? Grasp has tools to explain how & why it makes decisions:
In the returned dict, the model's explanation is: "you wrote hat + ate (hate)".