Sumit Pal is an independent consultant in big data and data science working with multiple clients and advising them on their data architectures and big data solutions as well as providing hands on coding with Spark, Scala, Java and Python. Sumit has more than 22 years of experience in the software industry in various roles spanning companies from startups to enterprises.
Sumit has worked for Microsoft (SQL server development team), Oracle (OLAP development team) and Verizon (big data analytics team) in a career spanning 22 years. He has extensive experience in building scalable systems across the stack from middle-tier, data tier to visualization for analytics applications, using big data, and NoSQL databases. Sumit has deep expertise in database Internals, data warehouses, dimensional modeling & data science.
Sumit has built a Big Data Analyst Training course with Experfy. Sumit has also recently hiked to Mt. Everest Base Camp in Oct, 2016 at 18.2K feet. Sumit has MS and BS in Computer Science.
What prompted you to write this book?
The idea of the book originally evolved when a lady named Susan McDermott from Apress walked up to me, after I had done speaking at a Big Data Conference in Chicago, Nov, 2015. She requested me to look into the idea of writing a book on the same topic that I speak at conferences.
I thought about this idea for over a month and then decided to take up the challenge, since I felt - there is no one book in the marketplace that provides a good overview of the SQL on Big Data technologies out there both at an application and architectural level. This was a hard thing for me to do – since I had a lot of client commitments in terms of work and then the additional responsibility of writing my first book – but I do love to do hard things (easy things do not excite me) – so it was natural and I dived right into it.
Lot of the content of the book evolved from my speaking engagements at these Big Data Conferences.
Can you give a short summary of the book ?
The book helps one to learn various commercial and open source products that perform SQL on Big Data platforms. One can understand the architectures of the various SQL engines being used and how these tools work internally in terms of execution, data movement, latency, scalability, performance, and system requirements.
This book consolidates in one place solutions to the challenges associated with the requirements of speed, scalability, and the variety of operations needed for data integration and SQL operations. It provides in-depth insight into the products, architectures, and innovations happening in this rapidly evolving space.
The book covers how SQL on Big Data engines are permeating the OLTP, OLAP, and Operational analytics space and the rapidly evolving HTAP systems.
One will learn the details of SQL Engines for:
Interactive Architectures―Architecture to support low latency on large data sets
Streaming Architectures―Architecture to support queries on data in motion
Operational Architectures―Architected for transactional and operational systems to support transactions on Big Data platforms
Innovative Architectures―Exploration of the rapidly evolving newer SQL engines on Big Data with innovative ideas
The book will however, not teach you how to write SQL Queries
Who will benefit by reading this?
This book is targeted for Beginners and Intermediate experience level professionals. Anyone who wants to learn more about what is going on in the Big Data world with SQL applications would benefit from this book.
The book provides insights into the best practices, recommendations and guidelines one needs to follow to choose SQL engine for solving their big data problems.
How is the book relevant now?
SQL has been in existence for last 40 years and Big Data for the last 10 years and their marriage is inevitable.
The whole Big Data Open Source and commercial space is teeming with innovative products and fierce competition to establish a foothold. Most professionals get lost in this forest and maze of products and it is often difficult to make product decisions based on marketing fluff.
The book consolidates in one place – various tools, products – SQL engines to work with Big Data across variety of data types and data sources – whether it is static data or fast data. It explores in-depth architectural underpinnings of some products and the major building blocks and core concepts of engineering gone into building these products and how they address some of the newer problems encountered by SQL engines with unstructured and semi-structured data.
Finally where can readers get this book?
Book is available at Amazon/Kindle, Barnes & Noble/Nook, Springer.com. It is available in subscription packages (e.g., Safari, Books24x7, SpringerLink) too.