AWS Neptune Update: Machine Learning, Data Science, and the Future of Graphical Databases


Data models and query languages ​​are admittedly a bit dry topics for people outside of the inner circle of connoisseurs. While graphical data models and query languages ​​are no exception to this rule, we’ve tried to keep up with developments in this area, for one main reason.

Graph is the fastest growing field in the largest segment of business software – databases. Case in point: A recent round of fundraising, culminating in Neo4j’s $ 325 million Series F fundraiser, brought its valuation to over $ 2 billion.

Neo4j is one of the longest-running graphics database providers, and now also the best funded. But that doesn’t mean he’s the only one worth watching. AWS entered the graphical database market in 2018 with Neptune, and it has made a lot of progress since then.

Today, AWS unveils support for openCypher, the open source Cypher-based query language from Neo4j. We take the opportunity to explain what this means and how it relates to the future of graphical databases, as well as to review interesting developments in Neptune’s support for machine learning and data science.

Building bridges with openCypher

Developers can now use openCypher, a popular graphical query language, with Amazon Neptune, offering more choices for building or migrating graphics applications. Neptune now supports the three most popular graphical query languages: Cypher, Gremlin, and SPARQL.

In addition, Neptune will add support for Bolt, the binary protocol of Neo4j. What this suggests is the ability to allow customers to take advantage of familiar and existing tools – tooling from Neo4, to be more precise. But there are other reasons why this is important.

There are two main data models used to model charts: RDF and Labeled Property Graph (LPG). Neptune supports both, with SPARQL serving as the query language for RDF and Gremlin serving as the query language for LPG. Gremlin has a lot to offer, as it has almost ubiquitous support and offers a lot of control over graph traverses. But it can also be a problem.

Gremlin, which is part of the Apache Tinkerpop project, is an imperative query language. This means that unlike declarative query languages ​​such as SQL, Cypher, and SPARQL, Gremlin queries not only express what to retrieve, but they must also specify how. In this respect, Gremlin is more like a programming language.


Amazon Neptune Architecture. Neptune’s abilities are now enhanced by its support for openCypher, which brings more flexibility to its arsenal.


Not all users are comfortable with Gremlin in all scenarios. If they wanted to use the GPL model, however, that was all they had to do. Amazon, despite employing some key contributors to Apache Tinkerpop, seems to recognize this. The addition of openCypher support makes working with the LPG engine in Neptune more accessible.

Neptune’s support for GPL and RDF is possible because it hosts two different engines under its hood, one for each data model. Adding support for openCypher doesn’t change that, at least not yet. But RDF * strength. RDF *, also known as RDF Star, is an update to the RDF standard that also allows it to model GPL graphics.

Work is underway in this area in the RDF and GPL working groups. Besides Amazon with Neptune, other RDF vendors are also adding experimental support for openCypher. The big picture here is ISO approved work in progress on GQL.

GQL is a new standard for graph query languages, aimed at unifying what is today a fragmented landscape. GQL is expected to do for graphical databases what SQL has done for relational databases. Amazon is active in the RDF * and GQL efforts.

Ultimately, this should allow Neptune to unify its two currently disparate engines. But the story here is bigger than just Amazon. The promise is that what Amazon will be able to do under the hood, all graphical database users should be able to do on their systems: use one data model and one query language.

Data Science and Machine Learning Features: Laptops and Graphical Neural Networks

GQL still has some way to go. Standardization efforts are always complicated and adoption is not guaranteed at all levels either. But Neptune also illustrates another important development in graph databases: the integration of data science and machine learning features.

Developing graphics applications and navigating graphics results are greatly facilitated by IDE and visual exploration tools adapted for this purpose. While many graphics database vendors have built-in tools for these purposes into their offerings, Neptune relied exclusively on third-party integrations until recently.

The Neptune team has chosen to fill this gap by developing AWS Graph Notebook. Notebooks are very popular among data scientists and machine learning practitioners, allowing them to mix and match code, data, visualization and documentation, and to work collaboratively.

AWS Graph Notebook is an open source Python package for Jupyter notebooks to support graph visualization. It supports both Gremlin and SPARQL, and we believe it will support openCypher as well. Although initially adopted by the data science and machine learning crowds, Amazon seems to believe that laptops are going to spread among developers as well.


Neptune ML is the codename Amazon gave to the integration between its Neptune graph database and its graph machine learning capabilities in SageMaker and DGL.


We will have to wait to see if this bet pays off. What is certain, however, is that the support for laptops reinforces Neptune’s appeal for data science and machine learning use cases. But that’s not all Neptune has to offer out there – come in Neptune ML.

Amazon is promoting Neptune ML as a way to make easy, fast, and accurate graph predictions with graphical neural networks (GNN). Neptune ML is powered by Amazon SageMaker and open source Deep graph library (DGL), to which Amazon contributes.

GNNs are a relatively new branch of Deep Learning, with the interesting characteristic that they exploit the additional contextual information that graphical data modeling can model to train deep learning algorithms. GNNs are considered to be at the cutting edge of machine learning technology, and they may have better accuracy in making predictions compared to conventional neural networks.

Integration of GNNs with graph databases is a natural match. GNNs can be used for node and edge level predictions, i.e. they can infer additional data and connections in graphs. They can be used to train models to infer properties for use cases such as fraud prediction, ad targeting, 360 client, referrals, identity resolution, and knowledge graph completion. .

Once again, Neptune is not the only one to integrate notebooks and machine learning into its offer. In addition to addressing the data science and machine learning crowd, these features can also enhance the developer and end-user experience. Better tools, better data, better analytics – all of this translates into better apps for end users. This is what all vendors are looking for.

Source link


Leave A Reply