Skip to content

cj2001/bite_sized_data_science

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bite-Sized Neo4j for Data Scientists

Written by: Dr. Clair J. Sullivan, Data Science Advocate, Neo4j

Twitter: @CJLovesData1

Last updated: January 30, 2024

All notebooks can be found in notebooks/. Some videos are strictly based on Cypher querys, which can be found in cypher/.

🆘 IMPORTANT NOTES FOR NEO4j v5 🆘

The content developed in this series was done using Neo4j v4. However, in 2023 Neo4j v5 was released. This version has some breaking changes that require adjustments to several of the queries herein. You can read more about those changes in the Cypher Manual - Removals, deprecations, additions, and extensions as well as this blog post.

Additionally, there are places in this series where we use the py2neo Python driver. There were "changes" that this driver went through in the fall of 2023 and I have not had the opportunity to go through this code to see if any of them break what is in here. Just a head's up.

If you find anything here that broke, please feel free to create an issue or, better yet, open a PR!!!

THIS SERIES IS ON HIATUS FOR A WHILE!!!

Stay tuned to the Neo4j YouTube channel for new episodes coming soon!

Note:

The notebooks in this repository are not meant to be stand-alone and thus are not commented. They go with the videos. So you are encouraged to watch the videos and then consult the notebooks should you will to look at the actual code in depth.

Videos

✨ ✨ Find this video series as its own webpage on the Neo4j webpage!!! ✨ ✨

Part 1: Connect from Jupyter to a Neo4j Sandbox

Part 2: Using the py2neo Python Driver

Part 3: Using the Neo4j Python Driver

Part 4: Basic Cypher Queries (and with Google Colab)

  • This video uses a Google Colab notebook, which can be found here

Part 5: Populating the Database from Pandas

  • This video refers to a YouTube video on how to create efficient Cypher queries, which is linked in the references below.

Part 6: Populating the Database with LOAD CSV

Part 7: Populating the Database with the neo4j-admin tool

  • This video works from the command line using Docker. The shell commands are provided in GitHub gists, which can be found here.
  • The data for this part can be found in data/ (the files are got-s1-nodes.csv and got-s1-edges.csv).

Part 8: Populating the Database from a JSON file

  • This video references a JSON file I created for my NODES 2021 tutorial, "Creating a Knowledge Graph with Neo4j: A Simple Machine Learning Approach."

Part 9: Cypher Queries 2

Part 10: Creating In-Memory Graphs with Cypher Projections

Part 11: Import RDF Data from Wikidata

  • To query Wikidata, it is helpful to know how to use SPARQL. The query builder that I showed (which has several great example queries) can be found here. Wikidata also provides a good SPARQL tutorial.
  • This video shows the use of Neosemantics for importing the RDF data. See below in the References for docs on how to use it.
  • This video also shows very quickly demonstrates Neo4j Bloom for visualization and queries. For an in-depth look at how to use Bloom, see this video.

Part 12: Creating In-Memory Graphs with Native Projections

  • This is the sister video for Part 10, which explored the other method for creating in-memory graphs.

Part 13: Calculating Centrality

Part 14: Community Detection with the Louvain Method

Part 15: Community Detection via Weakly Connected Components

Part 16: Using Strongly Connected Components to Detect Communities

Part 17: Creating FastRP Graph Embeddings

Part 18: Putting Graph Embeddings into a Machine Learning Model

  • This video moves quickly! It will be important to read this blog post, particularly for understanding how to get the embeddings into a format for the machine learning model.

Part 19: Starting with a SQL table...

  • This video is the start of a series looking at why we might want to go from SQL to a graph database
  • It is based off of the graph data that can be found in here
  • I use PostgreSQL for my demonstrations, but you can use your SQL of choice
  • All queries to populate your database are in ./sql_queries/part19

Part 20: ...And compare it to a graph... (2/n)

  • This video builds off of Part 19, using the same data imported into Neo4j
  • To create the CSV files used for this graph, I exported each of the tables in Part 19 directly from Postgres via pgAdmin
    • I made some tweaks of the headers to get them into Neo4j via LOAD CSV easily
    • The data files can be found in ./data

Part 21: An example of when querying a graph can be easier than SQL (3/n)

  • This video builds off of Parts 19 and 20 of this series
  • If you do not already have a Neo4j database populated with this data, follow the instructions in Part 20 or run the script ./cypher_queries/part20.cql to populate the database

Part 22: A side-by-side calculation of degree using SQL and Neo4j (4/n)

  • This video builds off of Parts 19-21 of this series
  • If you do not already have a SQL database populated with this data, use the queries in ./sql_queries/part19/
  • If you do not already have a Neo4j database populated with this data, follow the instructions in Part 20 or run the script ./cypher_queries/part20.cql to populate the database

Part 23: PageRank done two ways (5/n)

  • This video builds off of Parts 19-22 of this series
  • We will be using a very simplistic graph for this demonstration
  • The PageRank SQL query was taken from this Stack Overflow post, which was originally written for T-SQL and has been modified in this repo to work in PostgreSQL

Page 24: Why graphs? (6/6)

  • This video builds off of Parts 19-23 of this series
  • This is the final video in the mini series-within-a-series for the SQL vs. Neo4j comparisons

Part 25: Creating a graph for a Kaggle competition

Part 26: Creating a graph model of the Kaggle competition (2/n)

Part 27: Node similarity of Kaggle competition graph (3/n)

Part 28: Using KNN to identify similar items of Kaggle competition graph (4/n)

  • This video is based off of Parts 25-27
  • If you need a refresher on how to create an in-memory graph projection as is done in this video, please consult Part 12
  • In this video we will do some very basic feature engineering to explore the K-Nearest Neighbors for each article of clothing to obtain similar articles
  • (The next video will also do KNN, but using some much more sophisticated features!)

Part 29: Using KNN with more sophisticated feature vectors (5/n)

  • This video is based off of Parts 25-28

Part 30: Introducing GDS 2.0!

  • This video just scrapes the surface of all of the new offerings within GDS 2.0, but focuses on the new GDS Python Client

References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published