The following guest blog post was authored by Ron Bennatan, co-founder of jSonar Inc.
SonarW: An Architecture for Speed, Low Cost and Simplicity
SonarW is a purpose-built NoSQL Big Data warehouse and analytics platform for today’s flexible modern data. It is ultra-efficient, utilizing parallel processing and demanding less hardware than other approaches. Moreover, SonarW brings NoSQL simplicity to the Big Data world.
Key architectural features include:
- JSON-native columnar persistence: This works well for both structured and unstructured data; data is always compressed; and can be processed in parallel for every operation.
- Indexing and Partitioning: All data is indexed using patent-pending Big Data indexes.
- Parallel and Distributed Processing: Everything is done in parallel-both across nodes and within a node to ensure small, cost effective clusters.
- JSON Optimized Code: Designed from the ground up for efficient columnar JSON processing.
- Lock-less Data Structures: Built for multi-thread, multicore, and SMID processing.
- Ease of Use: SonarW inherits its ease of use and simplicity from the NoSQL world and is 100 percent MongoDB compatible. Big Data teams are more productive and can spend less time on platform and code.
Due to its key architectural advantages over today’s Big Data warehousing approaches, SonarW defers the need for large clusters and scales to any size but does not require an unreasonable number of nodes to perform workloads of other Big Data solutions. As a result, the platform reduces both hardware costs and the costs of managing these clusters.
Why is there a Need for a NoSQL Data Warehouse for Big Data Analytics?
Big Data implementations can be complex
Big Data is no longer a stranger to the IT world. All organizations have embarked on the Big Data path and are building data lakes, new forms of the Enterprise Data Warehouse, and more. But many of them still struggle to reap the benefits and some are stuck in the “collection phase”. Landing the data is always the first phase, and that tends to be successful; it’s the next phase, the usage phase-such as producing useful Big Data analytics – that is hard. Some call this the “Hadoop Hangover”. Some never go past the ETL phase, using the Data Lake as no more than an ETL area and loading the data back into conventional data stores. Some give up.
When these initiatives stall the reason is complexity. But while all this is happening, on the other “side” of the data management arena, the NoSQL world has perfected precisely that. Perhaps the main reason that NoSQL databases such as MongoDB has been so successful is the appeal to developers who find it easy to use and who feel they are an order of magnitude more productive than other environments.
Bringing NoSQL Simplicity to Big Data
So why not merge the two? Why not take NoSQL’s simplicity and bring it to the Big Data world? That was precisely the question we put to ourselves when we went out to build SonarW – a Big Data warehouse that has the look-and-feel of MongoDB, the speed and functionality of MPP RDBMS warehouses and the scale of Hadoop.
- Simple- but not simplistic.
- Flexible- yet has enough self-describing structure to make it effective.
- Structured – but one that is easy to work with, can express anything, and can bring the simplicity and flexibility that people love.
JSON is the fastest growing data format on earth – by a lot. It is also the perfect foundation for Big Data where disparate sources need to quickly flow in and be used for deriving insight.
For SonarW, we started with JSON and asked ourselves how we can make it scale – and the answer was in compressed columnar storage of JSON coupled with rich analytic pipelines that can be executed directly on the JSON data. Everything looks like a NoSQL data pipeline similar to MongoDB or Google Dremel or other modern data flows, but they execute on an efficient columnar fabric and all without the need to define schema, to work hard to normalize data or to completely lose control without any structure.
Efficient scalability also reduces complexity
The other goal we set for SonarW is efficiency. Everything scales horizontally these days – and SonarW is no exception. But scaling horizontally allows one to hide inefficiencies. Throw enough hardware at anything and things go fast. But it also becomes expensive – especially in the enterprise where costs and charge-backs are high. We fondly refer to SonarW as “Big-but-Lean Data”. I.e. it’s good to scale, but it’s better to do it efficiently. As an example, the figure below shows the number of nodes and costs to run the Big Data benchmark on a set of platforms. All these systems achieved the same minimal performance scores (with RedShift and SonarW being faster than the others), but the size and cost of the clusters were different (in both charts, smaller is better).
NoSQL can optimize Big Data analytics success
A NoSQL approach has been shown to be a highly successful approach for Big Data OLTP databases as provided by companies such as MongoDB. However, no such capability has been available for Big Data analytics. SonarW was built, from the ground up – with a JSON columnar architecture – to provide a simple NoSQL interface along with MPP speeds and efficient scalability that optimizes the developer’s ability to deliver on Big Data analytics projects.
For more information about jSonar and SonarW please visit www.jsonar.com
Big Data Benchmark: Breakthrough Cost and Performance Results
One of the benchmarks used for Big Data workloads is the “Big Data Benchmark,” which is run by the AMP lab at Berkeley. This benchmark runs workloads on representatives from the Hadoop ecosystem (e.g. Hive, Spark, Tex, etc), as well as from MPP environments. Note SonarW’s performance and cost in comparison to Tez, Shark, Redshift, Impala and Hive.
Ron Bennatan Vita
Ron Bennatan is a co-founder at jSonar Inc. He has been a “database guy” for 25 years and has worked at companies such as J.P. Morgan, Merrill Lynch, Intel, IBM and AT&T Bell Labs. He was co-founder and CTO at Guardium which was acquired by IBM where he later served as a Distinguished Engineer and the CTO for Big Data Governance. He is now focused on NoSQL Big Data analytics. He has a Ph.D. in Computer Science and has authored 11 technical books.