Bite-Sized Neo4j for Data Scientists

Written by: Dr. Clair J. Sullivan, Data Science Advocate, Neo4j

email: clair.sullivan@neo4j.com

Twitter: @CJLovesData1

Last updated: January 30, 2024

All notebooks can be found in notebooks/. Some videos are strictly based on Cypher querys, which can be found in cypher/.

🆘 IMPORTANT NOTES FOR NEO4j v5 🆘

The content developed in this series was done using Neo4j v4. However, in 2023 Neo4j v5 was released. This version has some breaking changes that require adjustments to several of the queries herein. You can read more about those changes in the Cypher Manual - Removals, deprecations, additions, and extensions as well as this blog post.

Additionally, there are places in this series where we use the py2neo Python driver. There were "changes" that this driver went through in the fall of 2023 and I have not had the opportunity to go through this code to see if any of them break what is in here. Just a head's up.

If you find anything here that broke, please feel free to create an issue or, better yet, open a PR!!!

THIS SERIES IS ON HIATUS FOR A WHILE!!!

Stay tuned to the Neo4j YouTube channel for new episodes coming soon!

Note:

The notebooks in this repository are not meant to be stand-alone and thus are not commented. They go with the videos. So you are encouraged to watch the videos and then consult the notebooks should you will to look at the actual code in depth.

Videos

✨ ✨ Find this video series as its own webpage on the Neo4j webpage!!! ✨ ✨

Complete YouTube playlist of full series

Part 1: Connect from Jupyter to a Neo4j Sandbox

Part 2: Using the py2neo Python Driver

Part 3: Using the Neo4j Python Driver

Part 4: Basic Cypher Queries (and with Google Colab)

This video uses a Google Colab notebook, which can be found here

Part 5: Populating the Database from Pandas

This video refers to a YouTube video on how to create efficient Cypher queries, which is linked in the references below.

Part 6: Populating the Database with LOAD CSV

This video references this GitHub repo that has the data used in this part.

Part 7: Populating the Database with the neo4j-admin tool

This video works from the command line using Docker. The shell commands are provided in GitHub gists, which can be found here.
The data for this part can be found in data/ (the files are got-s1-nodes.csv and got-s1-edges.csv).

Part 8: Populating the Database from a JSON file

This video references a JSON file I created for my NODES 2021 tutorial, "Creating a Knowledge Graph with Neo4j: A Simple Machine Learning Approach."
- Repository for the workshop: Contains the JSON file
  - I have also put this file in the data/ directory of this repository, but the Cypher query I used in the video (cypher_queries/part8.cql) uses the workshop repo.
- Video of the workshop

Part 9: Cypher Queries 2

Part 10: Creating In-Memory Graphs with Cypher Projections

Part 11: Import RDF Data from Wikidata

To query Wikidata, it is helpful to know how to use SPARQL. The query builder that I showed (which has several great example queries) can be found here. Wikidata also provides a good SPARQL tutorial.
This video shows the use of Neosemantics for importing the RDF data. See below in the References for docs on how to use it.
This video also shows very quickly demonstrates Neo4j Bloom for visualization and queries. For an in-depth look at how to use Bloom, see this video.

Part 12: Creating In-Memory Graphs with Native Projections

This is the sister video for Part 10, which explored the other method for creating in-memory graphs.

Part 13: Calculating Centrality

Part 14: Community Detection with the Louvain Method

Part 15: Community Detection via Weakly Connected Components

Part 16: Using Strongly Connected Components to Detect Communities

Part 17: Creating FastRP Graph Embeddings

For more information on how FastRP works, see the following blog posts:
- Behind the scenes on the Fast Random Projection algorithm for generating graph embeddings
- Making FastRP Graph Embeddings Work for You

Part 18: Putting Graph Embeddings into a Machine Learning Model

This video moves quickly! It will be important to read this blog post, particularly for understanding how to get the embeddings into a format for the machine learning model.

Part 19: Starting with a SQL table...

This video is the start of a series looking at why we might want to go from SQL to a graph database
It is based off of the graph data that can be found in here
I use PostgreSQL for my demonstrations, but you can use your SQL of choice
All queries to populate your database are in ./sql_queries/part19

Part 20: ...And compare it to a graph... (2/n)

This video builds off of Part 19, using the same data imported into Neo4j
To create the CSV files used for this graph, I exported each of the tables in Part 19 directly from Postgres via pgAdmin
- I made some tweaks of the headers to get them into Neo4j via LOAD CSV easily
- The data files can be found in ./data

Part 21: An example of when querying a graph can be easier than SQL (3/n)

This video builds off of Parts 19 and 20 of this series
If you do not already have a Neo4j database populated with this data, follow the instructions in Part 20 or run the script ./cypher_queries/part20.cql to populate the database

Part 22: A side-by-side calculation of degree using SQL and Neo4j (4/n)

This video builds off of Parts 19-21 of this series
If you do not already have a SQL database populated with this data, use the queries in ./sql_queries/part19/
If you do not already have a Neo4j database populated with this data, follow the instructions in Part 20 or run the script ./cypher_queries/part20.cql to populate the database

Part 23: PageRank done two ways (5/n)

This video builds off of Parts 19-22 of this series
We will be using a very simplistic graph for this demonstration
The PageRank SQL query was taken from this Stack Overflow post, which was originally written for T-SQL and has been modified in this repo to work in PostgreSQL

Page 24: Why graphs? (6/6)

This video builds off of Parts 19-23 of this series
This is the final video in the mini series-within-a-series for the SQL vs. Neo4j comparisons

Part 25: Creating a graph for a Kaggle competition

This video is based off of the H&M Personalized Fashion Recommendations Kaggle competition
The original data can be found and downloaded from the Kaggle public API via their CLI tool, assuming you have a Kaggle account
- For information on how to use the Kaggle public API, see this article

Part 26: Creating a graph model of the Kaggle competition (2/n)

This video is based off of Part 25, which uses the H&M Personalized Fashion Recommendations Kaggle competition
There is no code used in this part
If you would like to make an image of a graph model for yourself, check out arrows.app

Part 27: Node similarity of Kaggle competition graph (3/n)

This video is based off of Parts 25 and 26, which uses the H&M Personalized Fashion Recommendations Kaggle competition
If you need a refresher on how to create an in-memory graph projection as is done in this video, please consult Part 12

Part 28: Using KNN to identify similar items of Kaggle competition graph (4/n)

This video is based off of Parts 25-27
If you need a refresher on how to create an in-memory graph projection as is done in this video, please consult Part 12
In this video we will do some very basic feature engineering to explore the K-Nearest Neighbors for each article of clothing to obtain similar articles
(The next video will also do KNN, but using some much more sophisticated features!)

Part 29: Using KNN with more sophisticated feature vectors (5/n)

This video is based off of Parts 25-28

Part 30: Introducing GDS 2.0!

This video just scrapes the surface of all of the new offerings within GDS 2.0, but focuses on the new GDS Python Client

Name		Name	Last commit message	Last commit date
Latest commit History 174 Commits
cypher_queries		cypher_queries
data		data
notebooks		notebooks
sql_queries		sql_queries
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cypher_queries

cypher_queries

data

data

notebooks

notebooks

sql_queries

sql_queries

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Bite-Sized Neo4j for Data Scientists

Written by: Dr. Clair J. Sullivan, Data Science Advocate, Neo4j

email: clair.sullivan@neo4j.com

Twitter: @CJLovesData1

Last updated: January 30, 2024

🆘 IMPORTANT NOTES FOR NEO4j v5 🆘

THIS SERIES IS ON HIATUS FOR A WHILE!!!

Note:

Videos

✨ ✨ Find this video series as its own webpage on the Neo4j webpage!!! ✨ ✨

Complete YouTube playlist of full series

References

About

Releases

Packages

Languages

cj2001/bite_sized_data_science

Folders and files

Latest commit

History

Repository files navigation

Bite-Sized Neo4j for Data Scientists

Written by: Dr. Clair J. Sullivan, Data Science Advocate, Neo4j

email: clair.sullivan@neo4j.com

Twitter: @CJLovesData1

Last updated: January 30, 2024

🆘 IMPORTANT NOTES FOR NEO4j v5 🆘

THIS SERIES IS ON HIATUS FOR A WHILE!!!

Note:

Videos

✨ ✨ Find this video series as its own webpage on the Neo4j webpage!!! ✨ ✨

References

About

Resources

Stars

Watchers

Forks

Languages