吴裕雄--天生自然 python数据分析:葡萄酒分析
# import pandas
import pandas as pd # creating a DataFrame
pd.DataFrame({'Yes': [50, 31], 'No': [101, 2]})
# another example of creating a dataframe
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland']})
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'],
'Sue': ['Pretty good.', 'Bland.']},
index = ['Product A', 'Product B'])
# creating a pandas series
pd.Series([1, 2, 3, 4, 5])
# we can think of a Series as a column of a DataFrame.
# we can assign index values to Series in same way as pandas DataFrame
pd.Series([10, 20, 30], index=['2015 sales', '2016 sales', '2017 sales'], name='Product A')
# reading a csv file and storing it in a variable
wine_reviews = pd.read_csv("F:\\kaggleDataSet\\wine-reviews\\winemag-data-130k-v2.csv")
# we can use the 'shape' attribute to check size of dataset
wine_reviews.shape
# To show first five rows of data, use 'head()' method
wine_reviews.head()
wine_reviews = pd.read_csv("F:\\kaggleDataSet\\wine-reviews\\winemag-data-130k-v2.csv", index_col=0)
wine_reviews.head()
wine_reviews.head().to_csv("F:\\wine_reviews.csv")
import pandas as pd
reviews = pd.read_csv("F:\\kaggleDataSet\\wine-reviews\\winemag-data-130k-v2.csv", index_col=0)
pd.set_option("display.max_rows", 5)
reviews
# access 'country' property (or column) of 'reviews'
reviews.country
# Another way to do above operation
# when a column name contains space, we have to use this method
reviews['country']
# To access first row of country column
reviews['country'][0]
# returns first row
reviews.iloc[0]
# returns first column (country) (all rows due to ':')
reviews.iloc[:, 0]
# retruns first 3 rows of first column
reviews.iloc[:3, 0]
# we can pass a list of indices of rows/columns to select
reviews.iloc[[0, 1, 2, 3], 0]
# We can also pass negative numbers as we do in Python
reviews.iloc[-5:]
# To select first entry in country column
reviews.loc[0, 'country']
# select columns by name using 'loc'
reviews.loc[:, ['taster_name', 'taster_twitter_handle', 'points']]
# 'set_index' to the 'title' field
reviews.set_index('title')
# 1. Find out whether wine is produced in Italy
reviews.country == 'Italy'
# 2. Now select all wines produced in Italy
reviews.loc[reviews.country == 'Italy'] #reviews[reviews.country == 'Italy']
# Add one more condition for points to find better than average wines produced in Italy
reviews.loc[(reviews.country == 'Italy') & (reviews.points >= 90)] # use | for 'OR' condition
reviews.loc[reviews.country.isin(['Italy', 'France'])]
reviews.loc[reviews.price.notnull()]
reviews['critic'] = 'everyone'
reviews.critic
# using iterable for assigning
reviews['index_backwards'] = range(len(reviews), 0, -1)
reviews['index_backwards']
吴裕雄--天生自然 python数据分析:葡萄酒分析的更多相关文章
- 吴裕雄--天生自然 PYTHON数据分析:所有美国股票和etf的历史日价格和成交量分析
# This Python 3 environment comes with many helpful analytics libraries installed # It is defined by ...
- 吴裕雄--天生自然 python数据分析:健康指标聚集分析(健康分析)
# This Python 3 environment comes with many helpful analytics libraries installed # It is defined by ...
- 吴裕雄--天生自然 PYTHON数据分析:基于Keras的CNN分析太空深处寻找系外行星数据
#We import libraries for linear algebra, graphs, and evaluation of results import numpy as np import ...
- 吴裕雄--天生自然 PYTHON数据分析:钦奈水资源管理分析
df = pd.read_csv("F:\\kaggleDataSet\\chennai-water\\chennai_reservoir_levels.csv") df[&quo ...
- 吴裕雄--天生自然 PYTHON数据分析:糖尿病视网膜病变数据分析(完整版)
# This Python 3 environment comes with many helpful analytics libraries installed # It is defined by ...
- 吴裕雄--天生自然 PYTHON数据分析:人类发展报告——HDI, GDI,健康,全球人口数据数据分析
import pandas as pd # Data analysis import numpy as np #Data analysis import seaborn as sns # Data v ...
- 吴裕雄--天生自然 python数据分析:医疗费数据分析
import numpy as np import pandas as pd import os import matplotlib.pyplot as pl import seaborn as sn ...
- 吴裕雄--天生自然 python数据分析:基于Keras使用CNN神经网络处理手写数据集
import pandas as pd import numpy as np import matplotlib.pyplot as plt import matplotlib.image as mp ...
- 吴裕雄--天生自然 PYTHON数据分析:医疗数据分析
import numpy as np # linear algebra import pandas as pd # data processing, CSV file I/O (e.g. pd.rea ...
随机推荐
- Maven--Maven 入门
1.POM <?xml version="1.0" encoding="utf-8" ?> <project xmlns="http ...
- C++中free()与delete的区别
1.new/delete是C++的操作符,而malloc/free是C中的函数. 2.new做两件事,一是分配内存,二是调用类的构造函数:同样,delete会调用类的析构函数和释放内存.而malloc ...
- idea maven Running C:\Users\Administrator\AppData\Local\Temp\archetype1tmp
Running C:\Users\Administrator\AppData\Local\Temp\archetype1tmp 在IDEA中通过maven项目管理工具创建javaweb项目的时候一直卡 ...
- vue项目环境搭建与组件介绍
Vue项目环境搭建 """ node ~~ python:node是用c++编写用来运行js代码的 npm(cnpm) ~~ pip:npm是一个终端应用商城,可以换国内 ...
- 吴裕雄--天生自然python学习笔记:python 用firebase实现英汉词典进阶版
用 post 方法创建的数据会自动产生一个 id (Key ),但有时也常常为了取得这个 id 而让程序难以处理 . 以英汉词典标准版来说,它的数据结构如下: 如果将每条数据都改为{eword:cwo ...
- 吴裕雄--天生自然C语言开发:文件读写
#include <stdio.h> int main() { FILE *fp = NULL; fp = fopen("/tmp/test.txt", "w ...
- 怎么保证RabbitMQ和kafuka集群的高可用性?
rabbitMQ有三种模式:单机模式,普通集群模式,镜像集群模式 RabbitMQ的高可用性 RabbitMQ是比较有代表性的,因为是基于主从做高可用性的,我们就以他为例子讲解第一种MQ的高可用性 ...
- PAT甲级——1006 Sign In and Sign Out
PATA1006 Sign In and Sign Out At the beginning of every day, the first person who signs in the compu ...
- Mybatis的generator自动生成代码
mybatis-generator有三种用法:命令行.ide插件.maven插件.本次使用maven生成 环境:IDEA,mysql8,maven (1):新建项目,本次以SpringBoot项目为例 ...
- OpenCV 级联分类器
#include "opencv2/objdetect/objdetect.hpp" #include "opencv2/highgui/highgui.hpp" ...