Dark Mode

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Graphlet-AI/graphml-class

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

418 Commits

Repository files navigation

Full Stack Graph Machine Learning: Theory, Practice, Tools and Techniques v1.1.2

This is a course from Graphlet AI on full-stack graph machine learning taught by Russell Jurney. Contact me if you'd like to attend a session or organize one for your firm! rjurney@graphlet.ai :)

Environment Setup

This class uses a Docker image rjurney/graphml-class. To bring it up as the jupyter service along with neo4j, run:

# Pull the Docker images BEFORE class starts, or it can take a while on a shared connection
docker compose pull

# Run a Jupyter Notebook container in the background with all requirements.txt installed
docker compose up -d

# Tail the Jupyter logs to see the JupyterLab url to connect in your browser
docker logs jupyter -f --tail 100

To shut down docker, be in this folder and type:

docker compose down

docker compose vs docker-compose

You say potato, I say patato... the docker compose command changed in recent versions :)

NOTE: older versions of docker may use the command docker-compose rather than the two word command docker compose.

VSCode Setup

To edit code in VSCode you may want a local Anaconda Python environment with the class's PyPi libraries installed. This will enable VSCode to parse the code, understand APIs and highlight errors.

Note: if you do not use Anaconda, consider using it :) You can use a Python 3 venv in the same way as conda.

Class Anaconda Environment

Create a new Anaconda environment:

conda create -n graphml python=3.10.11 -y

Activate the environment:

conda activate graphml

Install the project's libraries:

poetry install

VSCode Interpretter

You can use a Python environment in VSCode by typing:

SHIFT-CMD-P

to bring up a command search window. Now type Python or Interpreter or if you see it, select Python: Select Interpreter. Now choose the path to your conda environment. It will include the name of the environment, such as:

Python 3.10.11 ('graphml') /opt/anaconda3/envs/graphml/bin/python

Note: the Python version is set to 3.10.11 because Jupyter Stacks have not been updated more recently.

Knowledge Graph Construction in PySpark

We build a knowledge graph from the Stack Exchange Archive for the network motif section of the course.

Docker Exec Commands

To run a bash shell in the Jupyter container, type:

docker exec -it jupyter bash

Once you're there, you can run the following commands to download and prepare the data for the course.

First, download the data:

graphml_class/stats/download.py stats.meta

Then you will need to convert the data from XML to Parquet:

spark-submit --packages "com.databricks:spark-xml_2.12:0.18.0" graphml_class/stats/xml_to_parquet.py

The course covers knowledge graph construction in PySpark in graphml_class.stats.graph.py.

spark-submit graphml_class/stats/graph.py

Network Motifs with GraphFrames

This course now covers network motifs in property graphs (frequent patterns of structure) using pyspark / GraphFrames (see motif.py, no notebook yet). It supports directed motifs, not undirected. All the 4-node motifs are outlined below. Note that GraphFrames can also filter the paths returned by its f.find() method using any Spark DataFrame filter - enabling temporal and complex property graph motifs.

About

Full Stack Graph Machine Learning: Theory, Practice, Tools and Techniques

Topics

Resources

Readme

License

MIT license

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors