Full Stack Graph Machine Learning: Theory, Practice, Tools and Techniques v1.1.2
This is a course from Graphlet AI on full-stack graph machine learning taught by Russell Jurney. Contact me if you'd like to attend a session or organize one for your firm! rjurney@graphlet.ai :)
Environment Setup
This class uses a Docker image rjurney/graphml-class. To bring it up as the jupyter service along with neo4j, run:
docker compose pull
# Run a Jupyter Notebook container in the background with all requirements.txt installed
docker compose up -d
# Tail the Jupyter logs to see the JupyterLab url to connect in your browser
docker logs jupyter -f --tail 100
To shut down docker, be in this folder and type:
docker compose vs docker-compose
You say potato, I say patato... the docker compose command changed in recent versions :)
NOTE: older versions of docker may use the command docker-compose rather than the two word command docker compose.
VSCode Setup
To edit code in VSCode you may want a local Anaconda Python environment with the class's PyPi libraries installed. This will enable VSCode to parse the code, understand APIs and highlight errors.
Note: if you do not use Anaconda, consider using it :) You can use a Python 3 venv in the same way as conda.
Class Anaconda Environment
Create a new Anaconda environment:
Activate the environment:
Install the project's libraries:
VSCode Interpretter
You can use a Python environment in VSCode by typing:
to bring up a command search window. Now type Python or Interpreter or if you see it, select Python: Select Interpreter. Now choose the path to your conda environment. It will include the name of the environment, such as:
Note: the Python version is set to 3.10.11 because Jupyter Stacks have not been updated more recently.
Knowledge Graph Construction in PySpark
We build a knowledge graph from the Stack Exchange Archive for the network motif section of the course.
Docker Exec Commands
To run a bash shell in the Jupyter container, type:
Once you're there, you can run the following commands to download and prepare the data for the course.
First, download the data:
Then you will need to convert the data from XML to Parquet:
The course covers knowledge graph construction in PySpark in graphml_class.stats.graph.py.
Network Motifs with GraphFrames
This course now covers network motifs in property graphs (frequent patterns of structure) using pyspark / GraphFrames (see motif.py, no notebook yet).
It supports directed motifs, not undirected. All the 4-node motifs are outlined below. Note that GraphFrames can also filter the
paths returned by its f.find() method using any Spark DataFrame filter - enabling temporal and complex property graph motifs.