Some of the material in is restricted to members of the community. By logging in, you may be able to gain additional access to certain collections or items. If you have questions about access or logging in, please use the form on the Contact Page.
Stallard, A. P. (2017). Comparing Samos Document Search Performance between Apache Solr and Neo4j. Retrieved from http://purl.flvc.org/fsu/fd/FSU_SUMMER2017_Stallard_fsu_0071N_13933
The Distributed Oceanographic Match-Up Service (DOMS) currently under development is a centralized service that allows researchers to easily match in situ and satellite oceanographic data from distributed sources to facilitate satellite calibration, validation, and retrieval algorithm development. The Shipboard Automated Meteorological and Oceanographic System (SAMOS) initiative provides routine access to high-quality marine meteorological and near-surface oceanographic observations from research vessels. SAMOS is one of several endpoints connected into the DOMS network, providing in-situ data for the match-up service. DOMS in-situ endpoints currently use Apache Solr as a backend search engine on each node in the distributed network. While Solr is a high-performance solution that facilitates creation and maintenance of indexed data, it is limited in the sense that its schema is fixed. The property graph model escapes this limitation by removing any prohibiting requirements on the data model, and permitting relationships between data objects. This paper documents the development of the SAMOS Neo4j property graph database including new search possibilities that take advantage of the property graph model, performance comparisons with Apache Solr, and a vision for graph databases as a storage tool for oceanographic data. The integration of the SAMOS Neo4j graph into DOMS is also described. Various data models are explored including spatial-temporal records from SAMOS added to a time tree using Graph Aware technology. This extension provides callable Java procedures within the CYPHER query language that generate in-graph structures used in data retrieval. Neo4j excels at performing relationship and path-based queries, which challenge relational-SQL databases because they require memory intensive joins due to the limitation of their design. Consider a user who wants to find records over several years, but only for specific months. If a traditional database only stores timestamps, this type of query could be complex and likely prohibitively slow. Using the time tree model in a graph, one can specify a path from the root to the data which restricts resolutions to certain time frames (e.g., months). This query can be executed without joins, unions, or other compute-intensive operations, putting Neo4j at a computational advantage to the SQL database alternative. That said, while this advantage may be useful, it should not be interpreted as an advantage to Solr in the context of DOMS. Solr makes use of Apache Lucene indexing at its core, while Neo4j provides its own native schema indexes. Ultimately they each provide unique solutions for data retrieval that are geared for specific tasks. In the DOMS setting it would appear that Solr is the most suitable option, as there seems to be very limited use cases where Neo4j does outperform Solr. This is primarily because the use case as a subsetting tool does not require the flexibility and path-based queries that graph database tools offer. Rather, DOMS nodes are using high performance indexing structures to quickly filter large amounts of raw data that are not deeply connected, a feature of large data sets where graph queries would indeed become useful.
A Thesis submitted to the Department of Computer Science in partial fulfillment of the requirements for the degree of Master of Science.
Bibliography Note
Includes bibliographical references.
Advisory Committee
Peixiang Zhao, Professor Co-Directing Thesis; Shawn Smith, Professor Co-Directing Thesis; Sonia Haiduc, Committee Member; Adrian Nistor, Committee Member.
Publisher
Florida State University
Identifier
FSU_SUMMER2017_Stallard_fsu_0071N_13933
Stallard, A. P. (2017). Comparing Samos Document Search Performance between Apache Solr and Neo4j. Retrieved from http://purl.flvc.org/fsu/fd/FSU_SUMMER2017_Stallard_fsu_0071N_13933