吴裕雄--天生自然 python数据分析:葡萄酒分析
# import pandas
import pandas as pd # creating a DataFrame
pd.DataFrame({'Yes': [50, 31], 'No': [101, 2]})
# another example of creating a dataframe
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland']})
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'],
'Sue': ['Pretty good.', 'Bland.']},
index = ['Product A', 'Product B'])
# creating a pandas series
pd.Series([1, 2, 3, 4, 5])
# we can think of a Series as a column of a DataFrame.
# we can assign index values to Series in same way as pandas DataFrame
pd.Series([10, 20, 30], index=['2015 sales', '2016 sales', '2017 sales'], name='Product A')
# reading a csv file and storing it in a variable
wine_reviews = pd.read_csv("F:\\kaggleDataSet\\wine-reviews\\winemag-data-130k-v2.csv")
# we can use the 'shape' attribute to check size of dataset
wine_reviews.shape
# To show first five rows of data, use 'head()' method
wine_reviews.head()
wine_reviews = pd.read_csv("F:\\kaggleDataSet\\wine-reviews\\winemag-data-130k-v2.csv", index_col=0)
wine_reviews.head()
wine_reviews.head().to_csv("F:\\wine_reviews.csv")
import pandas as pd
reviews = pd.read_csv("F:\\kaggleDataSet\\wine-reviews\\winemag-data-130k-v2.csv", index_col=0)
pd.set_option("display.max_rows", 5)
reviews
# access 'country' property (or column) of 'reviews'
reviews.country
# Another way to do above operation
# when a column name contains space, we have to use this method
reviews['country']
# To access first row of country column
reviews['country'][0]
# returns first row
reviews.iloc[0]
# returns first column (country) (all rows due to ':')
reviews.iloc[:, 0]
# retruns first 3 rows of first column
reviews.iloc[:3, 0]
# we can pass a list of indices of rows/columns to select
reviews.iloc[[0, 1, 2, 3], 0]
# We can also pass negative numbers as we do in Python
reviews.iloc[-5:]
# To select first entry in country column
reviews.loc[0, 'country']
# select columns by name using 'loc'
reviews.loc[:, ['taster_name', 'taster_twitter_handle', 'points']]
# 'set_index' to the 'title' field
reviews.set_index('title')
# 1. Find out whether wine is produced in Italy
reviews.country == 'Italy'
# 2. Now select all wines produced in Italy
reviews.loc[reviews.country == 'Italy'] #reviews[reviews.country == 'Italy']
# Add one more condition for points to find better than average wines produced in Italy
reviews.loc[(reviews.country == 'Italy') & (reviews.points >= 90)] # use | for 'OR' condition
reviews.loc[reviews.country.isin(['Italy', 'France'])]
reviews.loc[reviews.price.notnull()]
reviews['critic'] = 'everyone'
reviews.critic
# using iterable for assigning
reviews['index_backwards'] = range(len(reviews), 0, -1)
reviews['index_backwards']
吴裕雄--天生自然 python数据分析:葡萄酒分析的更多相关文章
- 吴裕雄--天生自然 PYTHON数据分析:所有美国股票和etf的历史日价格和成交量分析
# This Python 3 environment comes with many helpful analytics libraries installed # It is defined by ...
- 吴裕雄--天生自然 python数据分析:健康指标聚集分析(健康分析)
# This Python 3 environment comes with many helpful analytics libraries installed # It is defined by ...
- 吴裕雄--天生自然 PYTHON数据分析:基于Keras的CNN分析太空深处寻找系外行星数据
#We import libraries for linear algebra, graphs, and evaluation of results import numpy as np import ...
- 吴裕雄--天生自然 PYTHON数据分析:钦奈水资源管理分析
df = pd.read_csv("F:\\kaggleDataSet\\chennai-water\\chennai_reservoir_levels.csv") df[&quo ...
- 吴裕雄--天生自然 PYTHON数据分析:糖尿病视网膜病变数据分析(完整版)
# This Python 3 environment comes with many helpful analytics libraries installed # It is defined by ...
- 吴裕雄--天生自然 PYTHON数据分析:人类发展报告——HDI, GDI,健康,全球人口数据数据分析
import pandas as pd # Data analysis import numpy as np #Data analysis import seaborn as sns # Data v ...
- 吴裕雄--天生自然 python数据分析:医疗费数据分析
import numpy as np import pandas as pd import os import matplotlib.pyplot as pl import seaborn as sn ...
- 吴裕雄--天生自然 python数据分析:基于Keras使用CNN神经网络处理手写数据集
import pandas as pd import numpy as np import matplotlib.pyplot as plt import matplotlib.image as mp ...
- 吴裕雄--天生自然 PYTHON数据分析:医疗数据分析
import numpy as np # linear algebra import pandas as pd # data processing, CSV file I/O (e.g. pd.rea ...
随机推荐
- 关于Java自动拆箱装箱中的缓存问题
package cn.zhang.test; /** * 测试自动装箱拆箱 * 自动装箱:基本类型自动转为包装类对象 * 自动拆箱:包装类对象自动转化为基本数据类型 * * * /*缓存问题*/ /* ...
- 漫谈设计模式(二):单例(Singleton)模式
1.前言 实际业务中,大多业务类只需要一个对象就能完成所有工作,另外再创建其他对象就显得浪费内存空间了,例如web开发中的servlet,这时便要用到单例模式,就如其名一样,此模式使某个类只能生成唯一 ...
- CF 1047 C - Enlarge GCD [素数筛]
传送门:http://codeforces.com/contest/1047/problem/C 题意:给出n个数字,求最少删除几个数可以使剩下的数字的GCD大于n个数字的GCD 思路:最开始想的是先 ...
- Kaggle——NFL Big Data Bowl
neural networks + feature engineering for the win 导入需要的库 import numpy as np import pandas as pd impo ...
- NAIPC2018
NAIPC2018 参考:http://www.cnblogs.com/LQLlulu/p/9513669.html?tdsourcetag=s_pctim_aiomsg https://www.cn ...
- 2019-ECfinal-M题-value
题目传送门 sol:每个下标都有选和不选两种情况,所以总方案数是$2^{n}$,在$n$最大是$100000$的情况下不符合要求.可以这样想,假设$i^{p}=k$有符合题目要求的解,还有一个整数$j ...
- 染色dp(确定一行就可行)
题:https://codeforces.com/contest/1027/problem/E 题意:给定n*n的方格,可以染黑白,要求相邻俩行”完全“不同或完全相同,对于列也是一样.然后限制不能拥有 ...
- day43-线程概念
#1.进程:程序不能单独运行,要将程序加载到内存当中,系统为它分配资源才能运行,而这种执行的程序就是进程. #程序和进程的区别在于:程序是指令的集合,它是进程运行的静态描述文本:进程是程序的一次执行活 ...
- Docker系列七: 使用Humpback管理工具管理容器(一款UI管理工具)
Humpback 可以帮助企业快速搭建轻量级的 Docker 容器云管理平台,若将你的 Docker 主机接入到 Humpback 平台中,就能够为你带来更快捷稳定的容器操作体验. 功能特点 Web操 ...
- Nginx的下载与安装
.创建文件输入网页中需要复制的 cat >/etc/yum.repos.d/nginx.repo<<EOF [nginx-stable] name=nginx stable repo ...