Python 数据分析基础小结
一、数据读取
1、读写数据库数据
读取函数:
- pandas.read_sql_table(table_name, con, schema=None, index_col=None, coerce_float=True, columns=None)
- pandas.read_sql_query(sql, con, index_col=None, coerce_float=True)
- pandas.read_sql(sql, con, index_col=None, coerce_float=True, columns=None)
- sqlalchemy.creat_engine('数据库产品名+连接工具名://用户名:密码@数据库IP地址:数据库端口号/数据库名称?charset = 数据库数据编码')
写出函数:
- DataFrame.to_sql(name, con, schema=None, if_exists=’fail’, index=True, index_label=None, dtype=None)
2、读写文本文件/csv数据
读取函数:
- pandas.read_table(filepath_or_buffer, sep=’\t’, header=’infer’, names=None, index_col=None, dtype=None, engine=None, nrows=None)
- pandas.read_csv(filepath_or_buffer, sep=’,’, header=’infer’, names=None, index_col=None, dtype=None, engine=None, nrows=None)
写出函数:
- DataFrame.to_csv(path_or_buf=None, sep=’,’, na_rep=”, columns=None, header=True, index=True,index_label=None,mode=’w’,encoding=None)
3、读写excel(xls/xlsx)数据
读取函数:
- pandas.read_excel(io, sheetname=0, header=0, index_col=None, names=None, dtype=None)
写出函数:
- DataFrame.to_excel(excel_writer=None, sheetname=None’, na_rep=”, header=True, index=True, index_label=None, mode=’w’, encoding=None)
4、读取剪贴板数据:
- pandas.read_clipboard()
二、数据预处理
1、数据清洗
重复数据处理
样本重复:
pandas.DataFrame(Series).drop_duplicates(self, subset=None, keep='first', inplace=False)
特征重复:
- 通用
def FeatureEquals(df):
dfEquals=pd.DataFrame([],columns=df.columns,index=df.columns)
for i in df.columns:
for j in df.columns:
dfEquals.loc[i,j]=df.loc[:,i].equals(df.loc[:,j])
return dfEquals
- 数值型特征
def drop_features(data,way = 'pearson',assoRate = 1.0):
'''
此函数用于求取相似度大于assoRate的两列中的一个,主要目的用于去除数值型特征的重复
data:数据框,无默认
assoRate:相似度,默认为1
'''
assoMat = data.corr(method = way)
delCol = []
length = len(assoMat)
for i in range(length):
for j in range(i+1,length):
if assoMat.iloc[i,j] >= assoRate:
delCol.append(assoMat.columns[j])
return(delCol)
缺失值处理
识别缺失值
- DataFrame.isnull()
- DataFrame.notnull()
- DataFrame.isna()
- DataFrame.notna()
处理缺失值
删除:DataFrame.dropna(self, axis=0, how='any', thresh=None, subset=None, inplace=False)
定值填补: DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None)
插补: DataFrame.interpolate(method=’linear’, axis=0, limit=None, inplace=False,limit_direction=’forward’, limit_area=None, downcast=None,**kwargs)
异常值处理
- 3σ原则
def outRange(Ser1):
boolInd = (Ser1.mean()-3*Ser1.std()>Ser1) | (Ser1.mean()+3*Ser1.var()< Ser1)
index = np.arange(Ser1.shape[0])[boolInd]
outrange = Ser1.iloc[index]
return outrange
注: 此方法只适用于正态分布
- 箱线图分析
def boxOutRange(Ser):
'''
Ser:进行异常值分析的DataFrame的某一列
'''
Low = Ser.quantile(0.25)-1.5*(Ser.quantile(0.75)-Ser.quantile(0.25))
Up = Ser.quantile(0.75)+1.5*(Ser.quantile(0.75)-Ser.quantile(0.25))
index = (Ser< Low) | (Ser>Up)
Outlier = Ser.loc[index]
return(Outlier)
2、合并数据
- 数据堆叠:pandas.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, copy=True)
- 主键合并:pandas.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False,suffixes=('_x', '_y'), copy=True, indicator=False)
- 重叠合并:pandas.DataFrame.combine_first(self, other)
3、数据变换
- 哑变量处理:pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, drop_first=False)
- 数据离散化:pandas.cut(x, bins, right=True, labels=None, retbins=False, precision=3, include_lowest=False)
4、数据标准化
- 标准差标准化:sklearn.preprocessing.StandardScaler
- 离差标准化: sklearn.preprocessing.MinMaxScaler
三、模型构建
1、训练集测试集划分
sklearn.model_selection.train_test_split(*arrays, **options)
2、 降维
class sklearn.decomposition.PCA(n_components=None, copy=True, whiten=False, svd_solver=’auto’, tol=0.0, iterated_power=’auto’, random_state=None)
3、交叉验证
sklearn.model_selection.cross_validate(estimator, X, y=None, groups=None, scoring=None, cv=None, n_jobs=1, verbose=0, fit_params=None, pre_dispatch=‘2*n_jobs’, return_train_score=’warn’)
4、模型训练与预测
- 有监督模型
clf = lr.fit(X_train, y_train)
clf.predict(X_test)
5、聚类
常用算法:
- K均值:class sklearn.cluster.KMeans(n_clusters=8, init=’k-means++’, n_init=10, max_iter=300, tol=0.0001, precompute_distances=’auto’, verbose=0, random_state=None, copy_x=True, n_jobs=1, algorithm=’auto’)
- DBSCAN密度聚类:class sklearn.cluster.DBSCAN(eps=0.5, min_samples=5, metric=’euclidean’, metric_params=None, algorithm=’auto’, leaf_size=30, p=None, n_jobs=1)
- Birch层次聚类:class sklearn.cluster.Birch(threshold=0.5, branching_factor=50, n_clusters=3, compute_labels=True, copy=True)
评价:
- 轮廓系数:sklearn.metrics.silhouette_score(X, labels, metric=’euclidean’, sample_size=None, random_state=None, **kwds)
- calinski_harabaz_score:sklearn.metrics.calinski_harabaz_score(X, labels)
- completeness_score:sklearn.metrics.completeness_score(labels_true, labels_pred)
- fowlkes_mallows_score:sklearn.metrics.fowlkes_mallows_score(labels_true, labels_pred, sparse=False)
- homogeneity_completeness_v_measure:sklearn.metrics.homogeneity_completeness_v_measure(labels_true, labels_pred)
- adjusted_rand_score:sklearn.metrics.adjusted_rand_score(labels_true, labels_pred)
- homogeneity_score:sklearn.metrics.homogeneity_score(labels_true, labels_pred)
- mutual_info_score:sklearn.metrics.mutual_info_score(labels_true, labels_pred, contingency=None)
- normalized_mutual_info_score:sklearn.metrics.normalized_mutual_info_score(labels_true, labels_pred)
- v_measure_score:sklearn.metrics.v_measure_score(labels_true, labels_pred)
注:后续含labels_true参数的均需真实值参与
6、分类
常用算法
- Adaboost分类:class sklearn.ensemble.AdaBoostClassifier(base_estimator=None, n_estimators=50, learning_rate=1.0, algorithm=’SAMME.R’, random_state=None)
- 梯度提升树分类:class sklearn.ensemble.GradientBoostingClassifier(loss=’deviance’, learning_rate=0.1, n_estimators=100, subsample=1.0, criterion=’friedman_mse’, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3, min_impurity_decrease=0.0, min_impurity_split=None, init=None, random_state=None, max_features=None, verbose=0, max_leaf_nodes=None, warm_start=False, presort=’auto’)
- 随机森林分类:class sklearn.ensemble.RandomForestClassifier(n_estimators=10, criterion=’gini’, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=’auto’, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, bootstrap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, class_weight=None)
- 高斯过程分类:class sklearn.gaussian_process.GaussianProcessClassifier(kernel=None, optimizer=’fmin_l_bfgs_b’, n_restarts_optimizer=0, max_iter_predict=100, warm_start=False, copy_X_train=True, random_state=None, multi_class=’one_vs_rest’, n_jobs=1)
- 逻辑回归:class sklearn.linear_model.LogisticRegression(penalty=’l2’, dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver=’liblinear’, max_iter=100, multi_class=’ovr’, verbose=0, warm_start=False, n_jobs=1)
- KNN:class sklearn.neighbors.KNeighborsClassifier(n_neighbors=5, weights=’uniform’, algorithm=’auto’, leaf_size=30, p=2, metric=’minkowski’, metric_params=None, n_jobs=1, **kwargs)
- 多层感知神经网络:class sklearn.neural_network.MLPClassifier(hidden_layer_sizes=(100, ), activation=’relu’, solver=’adam’, alpha=0.0001, batch_size=’auto’, learning_rate=’constant’, learning_rate_init=0.001, power_t=0.5, max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
- SVM:class sklearn.svm.SVC(C=1.0, kernel=’rbf’, degree=3, gamma=’auto’, coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape=’ovr’, random_state=None)
- 决策树:class sklearn.tree.DecisionTreeClassifier(criterion=’gini’, splitter=’best’, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, class_weight=None, presort=False)
评价:
- 准确率:sklearn.metrics.accuracy_score(y_true, y_pred, normalize=True, sample_weight=None)
- AUC:sklearn.metrics.auc(x, y, reorder=False)
- 分类报告:sklearn.metrics.classification_report(y_true, y_pred, labels=None, target_names=None, sample_weight=None, digits=2)
- 混淆矩阵:sklearn.metrics.confusion_matrix(y_true, y_pred, labels=None, sample_weight=None)
- kappa:sklearn.metrics.cohen_kappa_score(y1, y2, labels=None, weights=None, sample_weight=None)
- F1值:sklearn.metrics.f1_score(y_true, y_pred, labels=None, pos_label=1, average=’binary’, sample_weight=None)
- 精确率:sklearn.metrics.precision_score(y_true, y_pred, labels=None, pos_label=1, average=’binary’, sample_weight=None)
- 召回率:sklearn.metrics.recall_score(y_true, y_pred, labels=None, pos_label=1, average=’binary’, sample_weight=None)
- ROC:sklearn.metrics.roc_curve(y_true, y_score, pos_label=None, sample_weight=None, drop_intermediate=True)
7、回归
常用算法:
- Adaboost回归:class sklearn.ensemble.AdaBoostRegressor(base_estimator=None, n_estimators=50, learning_rate=1.0, loss=’linear’, random_state=None)
- 梯度提升树回归:class sklearn.ensemble.GradientBoostingRegressor(loss=’ls’, learning_rate=0.1, n_estimators=100, subsample=1.0, criterion=’friedman_mse’, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_depth=3, min_impurity_decrease=0.0, min_impurity_split=None, init=None, random_state=None, max_features=None, alpha=0.9, verbose=0, max_leaf_nodes=None, warm_start=False, presort=’auto’)
- 随机森林回归:class sklearn.ensemble.RandomForestRegressor(n_estimators=10, criterion=’mse’, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=’auto’, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, bootstrap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False)
- 高斯过程回归:class sklearn.gaussian_process.GaussianProcessRegressor(kernel=None, alpha=1e-10, optimizer=’fmin_l_bfgs_b’, n_restarts_optimizer=0, normalize_y=False, copy_X_train=True, random_state=None)
- 保序回归:class sklearn.isotonic.IsotonicRegression(y_min=None, y_max=None, increasing=True, out_of_bounds=’nan’)
- Lasso回归:class sklearn.linear_model.Lasso(alpha=1.0, fit_intercept=True, normalize=False, precompute=False, copy_X=True, max_iter=1000, tol=0.0001, warm_start=False, positive=False, random_state=None, selection=’cyclic’)
- 线性回归:class sklearn.linear_model.LinearRegression(fit_intercept=True, normalize=False, copy_X=True, n_jobs=1)
- 岭回归: class sklearn.linear_model.Ridge(alpha=1.0, fit_intercept=True, normalize=False, copy_X=True, max_iter=None, tol=0.001, solver=’auto’, random_state=None)
- KNN回归:class sklearn.neighbors.KNeighborsRegressor(n_neighbors=5, weights=’uniform’, algorithm=’auto’, leaf_size=30, p=2, metric=’minkowski’, metric_params=None, n_jobs=1, **kwargs)
- 多层感知神经网络回归:class sklearn.neural_network.MLPRegressor(hidden_layer_sizes=(100, ), activation=’relu’, solver=’adam’, alpha=0.0001, batch_size=’auto’, learning_rate=’constant’, learning_rate_init=0.001, power_t=0.5, max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08)
- SVM回归:class sklearn.svm.SVR(kernel=’rbf’, degree=3, gamma=’auto’, coef0=0.0, tol=0.001, C=1.0, epsilon=0.1, shrinking=True, cache_size=200, verbose=False, max_iter=-1)
- 决策树回归:class sklearn.tree.DecisionTreeRegressor(criterion=’mse’, splitter=’best’, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, presort=False)
评价:
- 可解释方差值:sklearn.metrics.explained_variance_score(y_true, y_pred, sample_weight=None, multioutput=’uniform_average’)
- 平均绝对误差:sklearn.metrics.mean_absolute_error(y_true, y_pred, sample_weight=None, multioutput=’uniform_average’)[source]
- 均方误差:sklearn.metrics.mean_squared_error(y_true, y_pred, sample_weight=None, multioutput=’uniform_average’)
- 均方对数误差:sklearn.metrics.mean_squared_log_error(y_true, y_pred, sample_weight=None, multioutput=’uniform_average’)
- 中值绝对误差:sklearn.metrics.median_absolute_error(y_true, y_pred)
- R²值:sklearn.metrics.r2_score(y_true, y_pred, sample_weight=None, multioutput=’uniform_average’)
八、demo
from sklearn import neighbors, datasets, preprocessing
from sklearn.cross_validation import train_test_split
from sklearn.metrics import accuracy_score
iris = datasets.load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=33)
scaler = preprocessing.StandardScaler().fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
knn = neighbors.KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
accuracy_score(y_test, y_pred)
四、绘图
1、创建画布或子图
函数名称 | 函数作用 |
---|---|
plt.figure | 创建一个空白画布,可以指定画布大小,像素。 |
figure.add_subplot | 创建并选中子图,可以指定子图的行数,列数,与选中图片编号。 |
2、绘制
函数名称 | 函数作用 |
---|---|
plt.title | 在当前图形中添加标题,可以指定标题的名称,位置,颜色,字体大小等参数。 |
plt.xlabel | 在当前图形中添加x轴名称,可以指定位置,颜色,字体大小等参数。 |
plt.ylabel | 在当前图形中添加y轴名称,可以指定位置,颜色,字体大小等参数。 |
plt.xlim | 指定当前图形x轴的范围,只能确定一个数值区间,而无法使用字符串标识。 |
plt.ylim | 指定当前图形y轴的范围,只能确定一个数值区间,而无法使用字符串标识。 |
plt.xticks | 指定x轴刻度的数目与取值 |
plt.yticks | 指定y轴刻度的数目与取值 |
plt.legend | 指定当前图形的图例,可以指定图例的大小,位置,标签。 |
3、中文
plt.rcParams['font.sans-serif'] = 'SimHei' ##设置字体为SimHei显示中文
plt.rcParams['axes.unicode_minus'] = False ##设置正常显示符号
4、不同图形
- 散点图:matplotlib.pyplot.scatter(x, y, s=None, c=None, marker=None, cmap=None, norm=None, vmin=None, vmax=None, alpha=None, linewidths=None, verts=None, edgecolors=None, hold=None, data=None,**kwargs)
- 折线图: matplotlib.pyplot.plot(*args, **kwargs)
- 直方图:matplotlib.pyplot.bar(left,height,width = 0.8,bottom = None,hold = None,data = None,** kwargs )
- 饼图:matplotlib.pyplot.pie(x, explode=None, labels=None, colors=None, autopct=None, pctdistance=0.6, shadow=False, labeldistance=1.1, startangle=None, radius=None, counterclock=True, wedgeprops=None, textprops=None, center=(0, 0), frame=False, hold=None, data=None)
- 箱线图:matplotlib.pyplot.boxplot(x, notch=None, sym=None, vert=None, whis=None, positions=None, widths=None, patch_artist=None, bootstrap=None, usermedians=None, conf_intervals=None, meanline=None, showmeans=None, showcaps=None, showbox=None, showfliers=None, boxprops=None, labels=None, flierprops=None, medianprops=None, meanprops=None, capprops=None, whiskerprops=None, manage_xticks=True, autorange=False, zorder=None, hold=None, data=None)
5、Demo
import numpy as np
import matplotlib.pyplot as plt
box = dict(facecolor='yellow', pad=5, alpha=0.2)
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2)
fig.subplots_adjust(left=0.2, wspace=0.6)
# Fixing random state for reproducibility
np.random.seed(19680801)
ax1.plot(2000*np.random.rand(10))
ax1.set_title('ylabels not aligned')
ax1.set_ylabel('misaligned 1', bbox=box)
ax1.set_ylim(0, 2000)
ax3.set_ylabel('misaligned 2',bbox=box)
ax3.plot(np.random.rand(10))
labelx = -0.3 # axes coords
ax2.set_title('ylabels aligned')
ax2.plot(2000*np.random.rand(10))
ax2.set_ylabel('aligned 1', bbox=box)
ax2.yaxis.set_label_coords(labelx, 0.5)
ax2.set_ylim(0, 2000)
ax4.plot(np.random.rand(10))
ax4.set_ylabel('aligned 2', bbox=box)
ax4.yaxis.set_label_coords(labelx, 0.5)
plt.show()
五、完整Demo
import numpy as np
import pandas as pd
airline_data = pd.read_csv("../data/air_data.csv",
encoding="gb18030") #导入航空数据
print('原始数据的形状为:',airline_data.shape)
## 去除票价为空的记录
exp1 = airline_data["SUM_YR_1"].notnull()
exp2 = airline_data["SUM_YR_2"].notnull()
exp = exp1 & exp2
airline_notnull = airline_data.loc[exp,:]
print('删除缺失记录后数据的形状为:',airline_notnull.shape)
#只保留票价非零的,或者平均折扣率不为0且总飞行公里数大于0的记录。
index1 = airline_notnull['SUM_YR_1'] != 0
index2 = airline_notnull['SUM_YR_2'] != 0
index3 = (airline_notnull['SEG_KM_SUM']> 0) & \
(airline_notnull['avg_discount'] != 0)
airline = airline_notnull[(index1 | index2) & index3]
print('删除异常记录后数据的形状为:',airline.shape)
airline_selection = airline[["FFP_DATE","LOAD_TIME",
"FLIGHT_COUNT","LAST_TO_END",
"avg_discount","SEG_KM_SUM"]]
## 构建L特征
L = pd.to_datetime(airline_selection["LOAD_TIME"]) - \
pd.to_datetime(airline_selection["FFP_DATE"])
L = L.astype("str").str.split().str[0]
L = L.astype("int")/30
## 合并特征
airline_features = pd.concat([L,
airline_selection.iloc[:,2:]],axis = 1)
print('构建的LRFMC特征前5行为:\n',airline_features.head())
from sklearn.preprocessing import StandardScaler
data = StandardScaler().fit_transform(airline_features)
np.savez('../tmp/airline_scale.npz',data)
print('标准化后LRFMC五个特征为:\n',data[:5,:])
from sklearn.cluster import KMeans #导入kmeans算法
airline_scale = np.load('../tmp/airline_scale.npz')['arr_0']
k = 5 ## 确定聚类中心数
#构建模型
kmeans_model = KMeans(n_clusters = k,n_jobs=4,random_state=123)
fit_kmeans = kmeans_model.fit(airline_scale) #模型训练
kmeans_model.cluster_centers_ #查看聚类中心
kmeans_model.labels_ #查看样本的类别标签
#统计不同类别样本的数目
r1 = pd.Series(kmeans_model.labels_).value_counts()
print('最终每个类别的数目为:\n',r1)
#绘制直方图矩阵
center = kmeans_model.cluster_centers_
names = ['入会时长','最近乘坐过本公司航班','乘坐次数','里程','平均折扣率']
import matplotlib.pyplot as plt
%matplotlib inline
ax = plt.figure(figsize=(8,8))
for i in range(k):
ax1 = ax.add_subplot(k,1,i+1)
plt.bar(range(5),center[:,i],width = 0.5)
plt.xlabel('类别')
plt.ylabel(names[i])
plt.savefig('聚类分析柱形图.png')
plt.show()
#绘制雷达图
fig = plt.figure(figsize=(8,8))
ax = fig.add_subplot(111, polar=True)# polar参数
angles = np.linspace(0, 2*np.pi, k, endpoint=False)
angles = np.concatenate((angles, [angles[0]])) # 闭合
Linecolor = ['bo-','r+:','gD--','yv-.','kp-'] #点线颜色
Fillcolor = ['b','r','g','y','k']
for i in range(k):
data = np.concatenate((center[i], [center[i][0]])) # 闭合
ax.plot(angles,data,Linecolor[i], linewidth=2)# 画线
ax.fill(angles, data, facecolor=Fillcolor[i], alpha=0.25)# 填充
ax.set_thetagrids(angles * 180/np.pi, names)
ax.set_title("客户分群雷达图", va='bottom')## 设定标题
ax.set_rlim(-1,3)## 设置各指标的最终范围
ax.grid(True)
Python 数据分析基础小结的更多相关文章
- Python数据分析基础教程
Python数据分析基础教程(第2版)(高清版)PDF 百度网盘 链接:https://pan.baidu.com/s/1_FsReTBCaL_PzKhM0o6l0g 提取码:nkhw 复制这段内容后 ...
- Python数据分析基础PDF
Python数据分析基础(高清版)PDF 百度网盘 链接:https://pan.baidu.com/s/1ImzS7Sy8TLlTshxcB8RhdA 提取码:6xeu 复制这段内容后打开百度网盘手 ...
- Numpy使用大全(python矩阵相关运算大全)-Python数据分析基础2
//2019.07.10python数据分析基础——numpy(数据结构基础) import numpy as np: 1.python数据分析主要的功能实现模块包含以下六个方面:(1)numpy—— ...
- python数据分析基础
---恢复内容开始--- Python数据分析基础(1) //2019.07.09python数据分析基础总结1.python数据分析主要使用IDE是Pycharm和Anaconda,最为常用和方便的 ...
- python 数据分析基础
安装Python基础的几个数据分析库: pip install pandas pip install numpy pip install scipy pip install scikit-surpri ...
- Python数据分析基础——读写CSV文件2
2.2筛选特定的行: 行中的值满足某个条件 行中的值属于某个集合 行中的值匹配于某个模式(即:正则表达式) 2.2.1:行中的值满足于某个条件: 基础python版: #!/usr/bin/env p ...
- Python数据分析基础——读写CSV文件
1.基础python代码: #!/usr/bin/env python3 # 可以使脚本在不同的操作系统之间具有可移植性 import sys # 导入python的内置sys模块,使得在命令行中向脚 ...
- python数据分析基础——numpy和matplotlib
numpy库是python的一个著名的科学计算库,本文是一个quickstart. 引入:计算BMI BMI = 体重(kg)/身高(m)^2假如有如下几组体重和身高数据,让求每组数据的BMI值: w ...
- Python数据分析基础——Numpy tutorial
参考link https://docs.scipy.org/doc/numpy-dev/user/quickstart.html 基础 Numpy主要用于处理多维数组,数组中元素通常是数字,索引值为 ...
随机推荐
- Facade外观模式(结构性模式)
1.系统的复杂度 需求:开发一个坦克模拟系统用于模拟坦克车在各种作战环境中的行为,其中坦克系统由引擎.控制器.车轮等各子系统构成.然后由对应的子系统调用. 常规的设计如下: #region 坦克系统组 ...
- 微信小程序图片变形解决方法
微信小程序的image标签中有个mode属性,使用aspectFill即可 注:image组件默认宽度300px.高度225px mode 有效值: mode 有 13 种模式,其中 4 种是缩放模式 ...
- PHP多进程系列笔记(一)
本系列文章将向大家讲解pcntl_*系列函数,从而更深入的理解进程相关知识. PCNTL在PHP中进程控制支持默认是关闭的.您需要使用 --enable-pcntl 配置选项重新编译PHP的 CGI或 ...
- Mysql索引会失效的几种情况分析
转:https://www.jb51.net/article/50649.htm 学习啦
- Java程序读取Properties文件
一.如果将properties文件保存在src目录下 1.读取文件代码如下: /** * 配置文件为settings.properties * YourClassName对应你类的名字 * / pri ...
- gateway-workman
最外层start.php,设置全局启动模式,加载Application里的个子服务目录下应用的启动文件(start开头,这些文件都是workman\work类的子类,在载入文件的同时,这些子服务会生成 ...
- Java NIO系列教程(十二) Java NIO与IO
当学习了Java NIO和IO的API后,一个问题马上涌入脑海: 我应该何时使用IO,何时使用NIO呢?在本文中,我会尽量清晰地解析Java NIO和IO的差异.它们的使用场景,以及它们如何影响您的代 ...
- 【JS】input输入框只能输入数字
一.实现思路 input只能输入纯数字的思路其实很简单,监听输入框值的变化,每次输入检索输入框的值,将非数字的字段替换成空,再将此值赋予给输入框. 关键代码: \d:匹配数字 ^/d:全文匹配非数字 ...
- 扒一扒HTTPS网站的内幕
215年6月,维基媒体基金会发布公告,旗下所有网站将默认开启HTTPS,这些网站中最为人所知的当然是全球最大的在线百科-维基百科.而更早时候的3月,百度已经发布公告,百度全站默认开启HTTPS.淘宝也 ...
- core Animation之CAKeyframeAnimation(关键帧动画)
CABasicAnimation的区别是:CABasicAnimation只能从一个数值(fromValue)变到另一个数值(toValue),而CAKeyframeAnimation会使用一个NSA ...