You don't need to own a Kindle device to enjoy Kindle books. Download one of our FREE Kindle apps to start reading Kindle books on all your devices.
To get the free app, enter your mobile phone number.
High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark 1st Edition, Kindle Edition
|Length: 595 pages||Enhanced Typesetting: Enabled||Page Flip: Enabled|
The best device for reading, full stop. Learn more
About the Author
Holden Karau is transgender Canadian, and an active open source contributor. When not in San Francisco working as a software development engineer at IBM's Spark Technology Center, Holden talks internationally on Apache Spark and holds office hours at coffee shops at home and abroad. She is a Spark committer with frequent contributions, specializing in PySpark and Machine Learning. Prior to IBM she worked on a variety of distributed, search, and classification problems at Alpine, Databricks, Google, Foursquare, and Amazon. She graduated from the University of Waterloo with a Bachelor of Mathematics in Computer Science. Outside of software she enjoys playing with fire, welding, scooters, poutine, and dancing.
Rachel Warren is a data scientist and software engineer at Alpine Data Labs, where she uses Spark to address real world data processing challenges. She has experience working as an analyst both in industry and academia. She graduated with a degree in Computer Science from Wesleyan University in Connecticut.--This text refers to the paperback edition.
- ASIN : B0725YT69J
- Publisher : O'Reilly Media; 1st edition (25 May 2017)
- Language : English
- File size : 5508 KB
- Simultaneous device usage : Unlimited
- Text-to-Speech : Enabled
- Enhanced typesetting : Enabled
- X-Ray : Not Enabled
- Word Wise : Not Enabled
- Print length : 595 pages
- Best Sellers Rank: 606,302 in Kindle Store (See Top 100 in Kindle Store)
- Customer Reviews:
Review this product
Top reviews from other countries
For beginner Spark users, the book may feel overwhelming, particularly as it focused on Spark RDDs rather than the Spark SQL API which is more widely used. I would highly recommend Zaharia and Chamber's Spark - the Definitive Guide as an alternative purchase as being both more comprehensive and easier to understand. For those, hoping to learn Scala/Spark Scala this book also probably dives in way too fast, and I would recommend Chuisano and Bjarnason's excellent Functional Programming in Scala (although quite hard) and Alexander's Functional Programming Simplified.
On the positive side, the chapter on Key/Value data, although perhaps fairly widely known, was both well-explained and clarifying as were some of the information about how to make more effective transformations.
Some of the code examples are so difficult to read. On top of this, huge chunks of the book 'build upon' old examples, but this just ends up being a complete refactor of the old examples to improve it. Therefore this book can't be used as a handbook without reading it through first. Code examples should have been small and distinct.
Despite these complaints this is a truly fantastic guide, full of straight answers that are difficult or impossible to find online via trial and error.
The text also references unreadable spark UI screenshots or coloured lines in black and white diagrams.