ورود به حساب

نام کاربری گذرواژه

گذرواژه را فراموش کردید؟ کلیک کنید

حساب کاربری ندارید؟ ساخت حساب

ساخت حساب کاربری

نام نام کاربری ایمیل شماره موبایل گذرواژه

برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید


09117307688
09117179751

در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید

دسترسی نامحدود

برای کاربرانی که ثبت نام کرده اند

ضمانت بازگشت وجه

درصورت عدم همخوانی توضیحات با کتاب

پشتیبانی

از ساعت 7 صبح تا 10 شب

دانلود کتاب Data Analysis with Python and PySpark

دانلود کتاب تجزیه و تحلیل داده ها با پایتون و پای اسپارک

Data Analysis with Python and PySpark

مشخصات کتاب

Data Analysis with Python and PySpark

ویرایش:  
نویسندگان:   
سری:  
 
ناشر:  
سال نشر:  
تعداد صفحات: 259 
زبان: English 
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) 
حجم فایل: 24 مگابایت 

قیمت کتاب (تومان) : 39,000



ثبت امتیاز به این کتاب

میانگین امتیاز به این کتاب :
       تعداد امتیاز دهندگان : 10


در صورت تبدیل فایل کتاب Data Analysis with Python and PySpark به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.

توجه داشته باشید کتاب تجزیه و تحلیل داده ها با پایتون و پای اسپارک نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.


توضیحاتی درمورد کتاب به خارجی



فهرست مطالب

PySpark in Action: Python data analysis at scale MEAP V07
Copyright
Welcome
Brief contents
Chapter 1: Introduction
	1.1 What is PySpark?
		1.1.1 You saw it coming: What is Spark?
		1.1.2 PySpark = Spark + Python
		1.1.3 Why PySpark?
		1.1.4 Your very own factory: how PySpark works
		1.1.5 Some physical planning with the cluster manager
		1.1.6 A factory made efficient through a lazy manager
	1.2 What will you learn in this book?
	1.3 What do I need to get started?
	1.4 Summary
Chapter 2: Your first data program in PySpark
	2.1 Setting up the pyspark shell
		2.1.1 The SparkSession entry-point
		2.1.2 Configuring how chatty spark is: the log level
	2.2 Mapping our program
	2.3 Reading and ingesting data into a data frame
	2.4 Exploring data in the DataFrame structure
		2.4.1 Peeking under the hood: the show() method
	2.5 Moving from a sentence to a list of words
		2.5.1 Selecting specific columns using select()
		2.5.2 Transforming columns: splitting a string into a list of words
		2.5.3 Renaming columns: alias and withColumnRenamed
	2.6 Reshaping your data: exploding a list into rows
	2.7 Working with words: changing case and removing punctuation
	2.8 Filtering rows
	2.9 Summary
	2.10 Exercises
		2.10.1 Exercise 2.1
		2.10.2 Exercice 2.2
		2.10.3 Exercise 2.3
		2.10.4 Exercise 2.4
		2.10.5 Exercise 2.5
Chapter 3: Submitting and scaling your first PySpark program
	3.1 Grouping records: Counting word frequencies
	3.2 Ordering the results on the screen using orderBy
	3.3 Writing data from a data frame
	3.4 Putting it all together: counting
		3.4.1 Simplifying your dependencies with PySpark’s import conventions
		3.4.2 Simplifying our program via method chaining
	3.5 Your first non-interactive program: using spark-submit
		3.5.1 Creating your own SparkSession
	3.6 Using spark-submit to launch your program in batch mode
	3.7 What didn’t happen in this Chapter
	3.8 Scaling up our word frequency program
	3.9 Summary
	3.10 Exercises
		3.10.1 Exercise 3.1
		3.10.2 Exercise 3.2
		3.10.3 Exercise 3.3
		3.10.4 Exercise 3.4
Chapter 4: Analyzing tabular data with pyspark.sql
	4.1 What is tabular data?
		4.1.1 How does PySpark represent tabular data?
	4.2 PySpark for analyzing and processing tabular data
	4.3 Reading delimited data in PySpark
		4.3.1 Customizing the SparkReader object to read CSV data files
		4.3.2 Exploring the shape of our data universe
	4.4 The basics of data manipulation: diagnosing our centre table
		4.4.1 Knowing what we want: selecting columns
		4.4.2 Keeping what we need: deleting columns
		4.4.3 Creating what’s not there: new columns with withColumn()
		4.4.4 Tidying our data frame: renaming and re-ordering columns
		4.4.5 Summarizing your data frame: describe() and summary()
	4.5 Summary
Chapter 5: The data frame through a new lens: joining and grouping
	5.1 From many to one: joining data
		5.1.1 What’s what in the world of joins
		5.1.2 Knowing our left from our right
		5.1.3 The rules to a successful join: the predicates
		5.1.4 How do you do it: the join method
		5.1.5 Naming conventions in the joining world
	5.2 Summarizing the data via: groupby and GroupedData
		5.2.1 A simple groupby blueprint
		5.2.2 A column is a column: using agg with custom column definitions
	5.3 Taking care of null values: drop and fill
		5.3.1 Dropping it like it’s hot
		5.3.2 Filling values to our heart’s content
	5.4 What was our question again: our end-to-end program
	5.5 Summary
	5.6 Exercises
		5.6.1 Exercise 5.4
		5.6.2 Exercise 5.5
		5.6.3 Exercise 5.6
Chapter 6: Multi-dimensional data frames: using PySpark with JSON data
	6.1 Open sesame: what does your data tell you?
	6.2 The first step in understanding our data: PySpark’s scalar types
		6.2.1 String and bytes
		6.2.2 The numerical tower(s): integer values
		6.2.3 The numerical tower(s): double, floats and decimals
		6.2.4 Date and timestamp
		6.2.5 Null and boolean
	6.3 PySpark’s complex types
		6.3.1 Complex types: the array
		6.3.2 Complex types: the map
	6.4 Structure and type: The dual-nature of the struct
		6.4.1 A data frame is an ordered collection of columns
		6.4.2 The second dimension: just enough about the row
		6.4.3 Casting your way to sanity
		6.4.4 Defaulting values with fillna
	6.5 Summary
	6.6 Exercises
		6.6.1 Exercise 6.1
		6.6.2 Exercise 6.2
Chapter 7: Bilingual PySpark: blending Python and SQL code
	7.1 Banking on what we know: pyspark.sql vs plain SQL
	7.2 Using SQL queries on a data frame
		7.2.1 Promoting a data frame to a Spark table
		7.2.2 Using the Spark catalog
	7.3 SQL and PySpark
	7.4 Using SQL-like syntax within data frame methods
		7.4.1 Select and where
		7.4.2 Group and order by
		7.4.3 Having
		7.4.4 Create tables/views
		7.4.5 Union and join
		7.4.6 Subqueries and common table expressions
		7.4.7 A quick summary of PySpark vs. SQL syntax
	7.5 Simplifying our code: blending SQL and Python together
		7.5.1 Reading our data
		7.5.2 Using SQL-style expressions in PySpark
	7.6 Conclusion
	7.7 Summary
	7.8 Exercises
		7.8.1 Exercise 7.1
		7.8.2 Exercise 7.2
		7.8.3 Exercise 7.3
		7.8.4 Exercise 7.4
Chapter 8: Extending PySpark with user-defined-functions
	8.1 PySpark, freestyle: the resilient distributed dataset
		8.1.1 Manipulating data the RDD way: map, filter and reduce
	8.2 Using Python to extend PySpark via user-defined functions
		8.2.1 It all starts with plain Python: using typed Python functions
		8.2.2 From Python function to UDF: two approaches
	8.3 Big data is just a lot of small data: using pandas UDF
		8.3.1 Setting our environment: connectors and libraries
		8.3.2 Preparing our data
		8.3.3 Scalar UDF
		8.3.4 Grouped map UDF
		8.3.5 Grouped aggregate UDF
		8.3.6 Going local to troubleshoot pandas UDF
	8.4 Summary
	8.5 Exercises
		8.5.1 Exercise 8.1
		8.5.2 Exercise 8.2
		8.5.3 Exercise 8.3
		8.5.4 Exercise 8.4
		8.5.5 Exercise 8.5
Chapter 10: A foray into machine learning: logistic regression with PySpark
	10.1 Reading, exploring and preparing our machine learning data set
		10.1.1 Exploring our data and getting our first feature columns
		10.1.2 Addressing data mishaps and building our first feature set
		10.1.3 Getting our data set ready for assembly: null imputation and casting
	10.2 Feature engineering and selection
		10.2.1 Weeding out the rare binary occurrence columns
		10.2.2 Creating custom features
		10.2.3 Removing highly correlated features
		10.2.4 Scaling our features
		10.2.5 Assembling the final data set with the Vector column type
	10.3 Training and evaluating our model
		10.3.1 Assessing model accuracy with the Evaluator object
		10.3.2 Getting the biggest drivers from our model: extracting the coefficients
	10.4 Summary
Appendix A: Exercise solutions
Appendix B: Installing PySpark locally
	B.1 Windows
		B.1.1 Install Java
		B.1.2 Install 7-zip
		B.1.3 Download and install Apache Spark
		B.1.4 Install Python
		B.1.5 Launching an iPython REPL and starting PySpark
		B.1.6 (Optional) Install and run Jupyter to use Jupyter notebook
	B.2 macOS
		B.2.1 Install Homebrew
		B.2.2 Install Java and Spark
		B.2.3 Install Anaconda/Python
		B.2.4 Launching a iPython REPL and starting PySpark
		B.2.5 (Optional) Install and run Jupyter to use Jupyter notebook
	B.3 GNU/Linux and WSL
		B.3.1 Install Java
		B.3.2 Installing Spark
		B.3.3 Install Python 3 and IPython
		B.3.4 Launch PySpark with IPython
		B.3.5 (Optional) Install and run Jupyter to use Jupyter notebook




نظرات کاربران