Other Sellers on Amazon
Enter your mobile phone or email address
By pressing ‘Send link’, you agree to Amazon's Conditions of Use.
You consent to receive an automated text message from or on behalf of Amazon about the Kindle App at your mobile number above. Consent is not a condition of any purchase. Message and data rates may apply.
Follow the Author
High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark Paperback – 2 June 2017
Enhance your purchase
Frequently bought together
From the Publisher
About the Author
Holden Karau is transgender Canadian, and an active open source contributor. When not in San Francisco working as a software development engineer at IBM's Spark Technology Center, Holden talks internationally on Apache Spark and holds office hours at coffee shops at home and abroad. She is a Spark committer with frequent contributions, specializing in PySpark and Machine Learning. Prior to IBM she worked on a variety of distributed, search, and classification problems at Alpine, Databricks, Google, Foursquare, and Amazon. She graduated from the University of Waterloo with a Bachelor of Mathematics in Computer Science. Outside of software she enjoys playing with fire, welding, scooters, poutine, and dancing.
Rachel Warren is a data scientist and software engineer at Alpine Data Labs, where she uses Spark to address real world data processing challenges. She has experience working as an analyst both in industry and academia. She graduated with a degree in Computer Science from Wesleyan University in Connecticut.
- ASIN : 1491943203
- Publisher : O'Reilly Media, Inc, USA; 1st edition (2 June 2017)
- Language : English
- Paperback : 358 pages
- ISBN-10 : 9781491943205
- ISBN-13 : 978-1491943205
- Dimensions : 17.78 x 1.88 x 23.34 cm
- Best Sellers Rank: 358,069 in Books (See Top 100 in Books)
- Customer Reviews:
Review this product
Top reviews from other countries
For beginner Spark users, the book may feel overwhelming, particularly as it focused on Spark RDDs rather than the Spark SQL API which is more widely used. I would highly recommend Zaharia and Chamber's Spark - the Definitive Guide as an alternative purchase as being both more comprehensive and easier to understand. For those, hoping to learn Scala/Spark Scala this book also probably dives in way too fast, and I would recommend Chuisano and Bjarnason's excellent Functional Programming in Scala (although quite hard) and Alexander's Functional Programming Simplified.
On the positive side, the chapter on Key/Value data, although perhaps fairly widely known, was both well-explained and clarifying as were some of the information about how to make more effective transformations.
Some of the code examples are so difficult to read. On top of this, huge chunks of the book 'build upon' old examples, but this just ends up being a complete refactor of the old examples to improve it. Therefore this book can't be used as a handbook without reading it through first. Code examples should have been small and distinct.
Despite these complaints this is a truly fantastic guide, full of straight answers that are difficult or impossible to find online via trial and error.
The text also references unreadable spark UI screenshots or coloured lines in black and white diagrams.