Python Data Analysis Library — pandas: Python Data Analysis Library

  • https://pandas.pydata.org/
  • pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
  • pandas: powerful Python data analysis toolkit — pandas 0.22.0 documentation
    • http://pandas.pydata.org/pandas-docs/stable/index.html
  • 10 Minutes to pandas — pandas 0.22.0 documentation
    • http://pandas.pydata.org/pandas-docs/stable/10min.html#
    • This is a short introduction to pandas, geared mainly for new users. You can see more complex recipes in the Cookbook

pandas · GitHub

  • https://github.com/pandas-dev
  • Powerful data manipulation tools for Python
pandas (software) - Wikipedia

pandas_百度百科

  • https://baike.baidu.com/item/pandas
  • Python Data Analysis Library 或 pandas 是基于NumPy 的一种工具,该工具是为了解决数据分析任务而创建的。Pandas 纳入了大量库和一些标准的数据模型,提供了高效地操作大型数据集所需的工具。pandas提供了大量能使我们快速便捷地处理数据的函数和方法。你很快就会发现,它是使Python成为强大而高效的数据分析环境的重要因素之一。

学习笔记之pandas Foundations | DataCamp - Pegasus923 - 博客园

  • https://www.cnblogs.com/pegasus923/p/9017799.html

资源 | 23种Pandas核心操作,你需要过一遍吗? - 机器学习算法与Python学习

  • https://mp.weixin.qq.com/s/klGFyKngYnwZYfhhLne8Sg
  • https://towardsdatascience.com/23-great-pandas-codes-for-data-scientists-cca5ed9d8a38
  • Pandas 是一个 Python 软件库,它提供了大量能使我们快速便捷地处理数据的函数和方法。一般而言,Pandas 是使 Python 成为强大而高效的数据分析环境的重要因素之一。在本文中,作者从基本数据集读写、数据处理和 DataFrame 操作三个角度展示了 23 个 Pandas 核心方法。

Python 数据处理库 pandas 入门教程 - 数据分析与开发

  • https://mp.weixin.qq.com/s/Qd9lqngAiD2AYVLvV54Xwg
  • pandas是一个Python语言的软件包,在我们使用Python语言进行机器学习编程的时候,这是一个非常常用的基础编程库。本文是对它的一个入门教程。
  • pandas提供了快速,灵活和富有表现力的数据结构,目的是使“关系”或“标记”数据的工作既简单又直观。它旨在成为在Python中进行实际数据分析的高级构建块。
  • 入门介绍
  • 核心数据结构
  • Series
  • DataFrame
  • Index对象与数据访问
  • 文件操作
  • 读取Excel文件
  • 读取CSV文件
  • 处理无效值
  • 忽略无效值
  • 替换无效值
  • 处理字符串

Python 数据处理库 pandas 进阶教程 - 数据分析与开发

  • https://mp.weixin.qq.com/s/_8b5sdvpMVR_M0XuEezrOQ
  • 数据访问
    • 基础方法:[]和.
    • loc与iloc
    • at与iat
  • Index对象
    • MultiIndex
  • 数据整合
    • Concat与Append
    • Merge与Join
  • 数据集合和分组操作
  • 时间相关
  • 图形展示

Python Pandas Functions in Parallel - Data and Stuff by Jay

  • http://www.racketracer.com/2016/07/06/pandas-in-parallel/
  • I’m always on the lookout for quick hacks and code snippets that might help improve efficiency. Most of the time that’s through stackoverflow but here’s one that deals with parallelization and efficiency that I thought would be helpful.
  • Since Pandas doesn’t have an internal parallelism feature yet, it makes doing apply functions with huge datasets a pain if the functions have expensive computation times. One way to shorten that amount of time is to split the dataset into separate pieces, perform the apply function, and then re-concatenate the pandas dataframes.

Pandas核心操作

  • https://mp.weixin.qq.com/s/2a_xS-BuPOpNCw3ZNZuYnQ

Comparison with SQL — pandas 0.23.3 documentation

  • https://pandas.pydata.org/pandas-docs/stable/comparison_with_sql.html#
  • How to rewrite your SQL queries in Pandas, and more ?
    • https://codeburst.io/how-to-rewrite-your-sql-queries-in-pandas-and-more-149d341fc53e

How to print all elements in a dataframe ?

  • python - Is there a way to pretty print an entire Pandas Series / DataFrame? - Stack Overflow
  • https://stackoverflow.com/questions/19124601/is-there-a-way-to-pretty-print-an-entire-pandas-series-dataframe
  • print(df.to_string())

How to get all column names of a dataframe?

  • list( df )

How to check if a dataframe column exists ?

  • python - How to check if a column exists in Pandas - Stack Overflow

    • https://stackoverflow.com/questions/24870306/how-to-check-if-a-column-exists-in-pandas
    • if 'A' in df.columns:

How to check if a dataframe column / serie is empty ?

  • python - How to check if pandas Series is empty? - Stack Overflow

    • https://stackoverflow.com/questions/24652417/how-to-check-if-pandas-series-is-empty
    • df.empty
    • df.dropna().empty
  • pandas.DataFrame.dropna — pandas 0.23.3 documentation
    • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html#pandas-dataframe-dropna
    • DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

How to find unique value in a column of dataframe ?

  • pandas.unique — pandas 0.22.0 documentation

    • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.unique.html#pandas.unique
  • pandas.Series.tolist — pandas 0.23.1 documentation
    • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.tolist.html#pandas-series-tolist

    List Unique Values In A pandas Column

    • https://chrisalbon.com/python/data_wrangling/pandas_list_unique_values_in_column/

How to query a specified column / panel ?

  • pandas.DataFrame.loc — pandas 0.23.0 documentation

    • http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html?highlight=loc#pandas.DataFrame.loc
  • Indexing and Selecting Data — pandas 0.23.1 documentation
    • http://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing
  • Slice with labels for row and single label for column. As mentioned above, note that both the start and stop of the slice are included.
    • >>> df.loc['cobra':'viper', 'max_speed']

How to get a specific column as series / dataframe ?

  • series

    • df[ 'col1' ] / df.col1
    • df[ [c for c in df.columns if c.startswith('a')][0] ]
  • dataframe
    • df[ [ 'col1' ] ]
    • df[ [c for c in df.columns if c.startswith('a')] ]
  • Choosing columns in pandas DataFrame – Kasia Rachuta – Medium
    • https://medium.com/@kasiarachuta/choosing-columns-in-pandas-dataframe-d0677b34a6ca
    • df[ 'col1' ]
    • This command picks a column and returns it as a Series
    • df[ [ 'col1' ] ]
    • Here, I chose the column and I get a DataFrame

How to get the last row / value of dataframe ?

  • How to get the last n row of pandas dataframe? - Stack Overflow

    • https://stackoverflow.com/questions/14663004/how-to-get-the-last-n-row-of-pandas-dataframe
    • df.iloc[ -1 ]
    • df.tail( 1 )
  • pandas.DataFrame.tail — pandas 0.23.3 documentation
    • http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.tail.html#pandas.DataFrame.tail
    • DataFrame.tail(n=5)
  • pandas.DataFrame.iloc — pandas 0.23.3 documentation
    • http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iloc.html#pandas.DataFrame.iloc
  • python - obtaining last value of dataframe column without index - Stack Overflow
    • https://stackoverflow.com/questions/34166030/obtaining-last-value-of-dataframe-column-without-index
    • df.column.iat[ -1 ]
  • Indexing and Selecting Data — pandas 0.23.3 documentation
    • http://pandas.pydata.org/pandas-docs/stable/indexing.html#fast-scalar-value-getting-and-setting
  • python - Loc vs. iloc vs. ix vs. at vs. iat? - Stack Overflow
    • https://stackoverflow.com/questions/28757389/loc-vs-iloc-vs-ix-vs-at-vs-iat
    • loc - label based
    • iloc - position based
    • at: get scalar values. It's a very fast loc
    • iat: Get scalar values. It's a very fast iloc

How to get scalar value of a panel with condition ?

  • id = df.loc[a==b, 'id'].values[0]
  • id = df[a==b]['id'].iat[0]
  • pandas.Panel.iat — pandas 0.23.4 documentation
    • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Panel.iat.html?highlight=iat#pandas.Panel.iat
    • Access a single value for a row/column pair by integer position.
    • Similar to iloc, in that both provide integer-based lookups. Use iat if you only need to get or set a single value in a DataFrame or Series.
  • python - How to get scalar value on a cell using conditional indexing - Stack Overflow
    • https://stackoverflow.com/questions/30813088/how-to-get-scalar-value-on-a-cell-using-conditional-indexing
    • get at the underlying numpy matrix using .values on a series or dataframe

How to get count of rows in dataframe ?

  • len( df )
  • Built-in Functions — Python 3.7.2 documentation
    • https://docs.python.org/3/library/functions.html#len
    • Return the length (the number of items) of an object. The argument may be a sequence (such as a string, bytes, tuple, list, or range) or a collection (such as a dictionary, set, or frozen set).
  • pandas python how to count the number of records or rows in a dataframe - Stack Overflow
    • https://stackoverflow.com/questions/17468878/pandas-python-how-to-count-the-number-of-records-or-rows-in-a-dataframe/41968240
    • To get the number of rows in a dataframe use:
    • df.shape[0]
    • (and df.shape[1] to get the number of columns).
    • As an alternative you can use
    • len(df)
    • or
    • len(df.index)
    • (and len(df.columns) for the columns)
    • shape is more versatile and more convenient than len(), especially for interactive work (just needs to be added at the end), but len is a bit faster (see also this answer).
    • To avoid: count() because it returns the number of non-NA/null observations over requested axis
    • len(df.index) is faster

How to create a new column with applying function on the existing columns ?

  • df['new'] = df.apply(lambda x : myfunc(x['old']), axis='columns')
  • pandas.DataFrame.apply — pandas 0.23.4 documentation
    • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html
    • DataFrame.apply(func, axis=0, broadcast=None, raw=False, reduce=None, result_type=None, args=(), **kwds)
    • Apply a function along an axis of the DataFrame.

How to look up the first match element ?

  • python - lookup first match in Pandas dataframe - Stack Overflow

    • https://stackoverflow.com/questions/46371391/lookup-first-match-in-pandas-dataframe
    • westcoast.loc[westcoast.state=='Oregon', 'capital'].item()
    • s = westcoast.loc[westcoast.state=='Oregon', 'capital']
    • s = np.nan if s.empty else s.iat[0]
  • pandas.Series.item — pandas 0.23.1 documentation
    • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.item.html
  • pandas.DataFrame.iat — pandas 0.23.1 documentation
    • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iat.html
  • pandas.DataFrame.empty — pandas 0.23.1 documentation
    • http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.empty.html?highlight=pandas%20dataframe%20empty#pandas-dataframe-empty

How to find index where elements should be inserted to maintain order ?

  • pandas.Series.searchsorted — pandas 0.23.3 documentation

    • http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.searchsorted.html?highlight=searchsorted#pandas.Series.searchsorted
    • Series.searchsorted(value, side='left', sorter=None)
  • pandas.Index.searchsorted — pandas 0.23.3 documentation
    • http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Index.searchsorted.html?highlight=searchsorted#pandas.Index.searchsorted
    • Index.searchsorted(value, side='left', sorter=None)
  • Essential Basic Functionality — pandas 0.23.3 documentation
    • http://pandas.pydata.org/pandas-docs/stable/basics.html#searchsorted
  • Cookbook — pandas 0.23.3 documentation
    • http://pandas.pydata.org/pandas-docs/stable/cookbook.html?highlight=searchsorted#merge
  • python - Pandas merge with logic - Stack Overflow
    • https://stackoverflow.com/questions/25125626/pandas-merge-with-logic/2512764

How to reset index ?

  • pandas.DataFrame.reset_index — pandas 0.23.4 documentation

    • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.reset_index.html
    • DataFrame.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')
      • drop : boolean, default False

        • Do not try to insert index into dataframe columns. This resets the index to the default integer index.

How to sort by values ?

  • pandas.DataFrame.sort_values — pandas 0.23.1 documentation

    • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html
    • DataFrame.sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last')

How to group by ?

    • Group By: split-apply-combine — pandas 0.23.0 documentation

      • https://pandas.pydata.org/pandas-docs/stable/groupby.html
    • pandas.DataFrame.groupby — pandas 0.23.0 documentation
      • http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html?highlight=groupby#pandas.DataFrame.groupby
      • DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, observed=False, **kwargs)
      • as_index : boolean, default True
      • For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output
      • sort : boolean, default True
      • Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. groupby preserves the order of rows within each group.

How to extract features by grouping columns ?

  • df_mean = (df.groupby('id').col.mean().rename('mean_col'))
  • 19 Essential Snippets in Pandas - 16. Extracting Features by Grouping Columns
    • https://jeffdelaney.me/blog/useful-snippets-in-pandas/
    • df.groupby('topping')['discount'].apply(lambda x: np.mean(x))
  • pandas.Series.rename — pandas 0.23.4 documentation
    • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.rename.html
    • Series.rename(index=None, **kwargs)

How to delete rows from dataframe permanently ?

  • pandas.DataFrame.drop — pandas 0.23.1 documentation

    • http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop.html#pandas.DataFrame.drop
    • DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
  • How to Delete a Row from a Pandas Dataframe Object in Python
    • http://www.learningaboutelectronics.com/Articles/How-to-delete-a-row-from-a-pandas-dataframe-object-in-Python.php
    • dataframe1.drop('D', inplace=True)

How to drop columns with specified names / list ?

  • cols = [c for c in df.columns if not c.startswith( ('col1', 'col2') ) ]
  • cols = [c for c in df.columns if not any( f in c for f in list_f ) ]
  • df = df[cols]
  • pandas.DataFrame.rename — pandas 0.23.4 documentation
    • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rename.html
    • DataFrame.rename(mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False, level=None)
  • Built-in Types — Python 3.7.1 documentation - str.startswith(prefix[, start[, end]])
    • https://docs.python.org/3/library/stdtypes.html#str.startswith
    • Return True if string starts with the prefix, otherwise return False. prefix can also be a tuple of prefixes to look for. With optional start, test string beginning at that position. With optional end, stop comparing string at that position.
  • python - Pandas dataframe: drop columns whose name contains a specific string - Stack Overflow
    • https://stackoverflow.com/questions/19071199/pandas-dataframe-drop-columns-whose-name-contains-a-specific-string
  • python - Check if multiple strings exist in another string - Stack Overflow
    • https://stackoverflow.com/questions/3389574/check-if-multiple-strings-exist-in-another-string
    • You can use any: if any(x in str for x in a):
    • Similarly to check if all the strings from the list are found, use all instead of any.

How to drop duplicates ?

  • pandas.DataFrame.drop_duplicates — pandas 0.22.0 documentation

    • https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.drop_duplicates.html
    • DataFrame.drop_duplicates(subset=None, keep='first', inplace=False)
    • grouped = grouped.drop_duplicates(['A', 'B'])
  • Drop all duplicate rows in Python Pandas - Stack Overflow
    • https://stackoverflow.com/questions/23667369/drop-all-duplicate-rows-in-python-pandas
  • Note that it will drop all duplicates. So an issue will occur if you just want to drop consecutive duplicates.

How to drop consecutive duplicates ?

  • pandas.DataFrame.shift — pandas 0.22.0 documentation

    • https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.shift.html?highlight=shift#pandas.DataFrame.shift
    • DataFrame.shift(periods=1, freq=None, axis=0)
  • python - Pandas: Drop consecutive duplicates - Stack Overflow
    • https://stackoverflow.com/questions/19463985/pandas-drop-consecutive-duplicates
    • a.loc[a.shift() != a]
    • de_dup = a[cols].loc[(a[cols].shift() != a[cols]).any(axis=1)]

How to shift column in dataframe ?

  • python - Shift column in pandas dataframe up by one? - Stack Overflow

    • https://stackoverflow.com/questions/20095673/shift-column-in-pandas-dataframe-up-by-one
    • df.gdp = df.gdp.shift(-1)
    • df[:-1]

How to copy a dataframe ?

  • pandas.DataFrame.copy — pandas 0.23.3 documentation

    • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.copy.html
    • DataFrame.copy(deep=True)
    • Make a copy of this object’s indices and data.
    • When deep=True (default), a new object will be created with a copy of the calling object’s data and indices. Modifications to the data or indices of the copy will not be reflected in the original object (see notes below).
    • When deep=False, a new object will be created without copying the calling object’s data or index (only references to the data and index are copied). Any changes to the data of the original will be reflected in the shallow copy (and vice versa).

How to create / copy a dataframe without data ?

  • df_others = pd.DataFrame(data=None, columns=df_source.columns, index=df_source.index)

    • It preserves columns, index, and replace all data with NaN, but with object dtypes
  • df_others = pd.DataFrame().reindex_like(df)
    • It preserves columns, index, and replace all data with NaN, but with float64 dtypes
  • df_others = df.copy()[:0]
    • It preserves columns and dtypes, but without index and data
  • python - Is there a way to copy only the structure (not the data) of a Pandas DataFrame? - Stack Overflow
    • https://stackoverflow.com/questions/27467730/is-there-a-way-to-copy-only-the-structure-not-the-data-of-a-pandas-dataframe
  • pandas.DataFrame — pandas 0.23.4 documentation
    • http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html#pandas.DataFrame
  • pandas.DataFrame.copy — pandas 0.23.4 documentation
    • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.copy.html
  • pandas.DataFrame.reindex_like — pandas 0.23.4 documentation
    • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.reindex_like.html
  • Indexing and Selecting Data — pandas 0.23.4 documentation
    • https://pandas.pydata.org/pandas-docs/stable/indexing.html#slicing-ranges

How to Concatenating a single Series into a string ?

  • Working with Text Data — pandas 0.23.1 documentation

    • http://pandas.pydata.org/pandas-docs/stable/text.html#concatenating-a-single-series-into-a-string

How to concat dataframe without duplicates ?

  • Pandas/Python: How to concatenate two dataframes without duplicates? - Stack Overflow
  • https://stackoverflow.com/questions/21317384/pandas-python-how-to-concatenate-two-dataframes-without-duplicates
  • pandas.concat([df1,df2]).drop_duplicates().reset_index(drop=True)

Merge, join, and concatenate — pandas 0.23.1 documentation

  • https://pandas.pydata.org/pandas-docs/stable/merging.html#merge-join-and-concatenate
  • python - Pandas: join DataFrames on field with different names? - Stack Overflow
    • https://stackoverflow.com/questions/25888207/pandas-join-dataframes-on-field-with-different-names
    • pandas.merge(df1, df2, how='left', left_on=['id_key'], right_on=['fk_key'])
  • pandas.DataFrame.merge — pandas 0.23.3 documentation
    • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html#pandas-dataframe-merge
    • DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)
      • how : {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’

        • left: use only keys from left frame, similar to a SQL left outer join; preserve key order
        • right: use only keys from right frame, similar to a SQL right outer join; preserve key order
        • outer: use union of keys from both frames, similar to a SQL full outer join; sort keys lexicographically
        • inner: use intersection of keys from both frames, similar to a SQL inner join; preserve the order of the left keys
      • sort : boolean, default False
        • Sort the join keys lexicographically in the result DataFrame. If False, the order of the join keys depends on the join type (how keyword)
  • pandas.concat — pandas 0.23.4 documentation
    • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html#pandas.concat
    • pandas.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=None, copy=True)

How to merge two pandas.Series.unique() ?

  • pandas.Series.unique — pandas 0.23.3 documentation

    • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.unique.html
    • Return unique values of Series object.
    • Returns: ndarray or Categorical
    • The unique values returned as a NumPy array. In case of categorical data type, returned as a Categorical.
  • numpy.append — NumPy v1.14 Manual
    • https://docs.scipy.org/doc/numpy/reference/generated/numpy.append.html
    • Append values to the end of an array.
    • np.append([1, 2, 3], [[4, 5, 6], [7, 8, 9]])

How to work with missing data ?

  • df['col'].fillna(pandas.Timestamp.min)
  • cols = [c for c in df.columns if 'a' in c]
  • df[cols] = df[cols].fillna( df[cols].mean() )
  • Working with missing data — pandas 0.23.4 documentation
    • http://pandas.pydata.org/pandas-docs/stable/missing_data.html?highlight=fill#working-with-missing-data
  • pandas.DataFrame.fillna — pandas 0.23.4 documentation
    • http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html#pandas-dataframe-fillna
    • DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None, **kwargs)
  • pandas.Timestamp.min — pandas 0.23.4 documentation
    • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Timestamp.min.html
    • Timestamp.min = Timestamp('1677-09-21 00:12:43.145225')

How to convert series to list ?

  • pandas.Series.tolist — pandas 0.23.1 documentation

    • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.tolist.html

Dataframe information ?

  • pandas.DataFrame.info — pandas 0.23.3 documentation

    • http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.info.html#pandas-dataframe-info
    • DataFrame.info(verbose=None, buf=None, max_cols=None, memory_usage=None, null_counts=None)
  • pandas.DataFrame.describe — pandas 0.23.3 documentation
    • http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.describe.html#pandas.DataFrame.describe
    • DataFrame.describe(percentiles=None, include=None, exclude=None)

How to calculate time differences in seconds ?

  • value = pd.to_datetime(end_timestamp) - pd.to_datetime(start_timestamp)).total_seconds()
  • df['duration'] = (df['end_timestamp'] - df['start_timestamp']).dt.seconds
  • Time Deltas — pandas 0.23.4 documentation
    • https://pandas.pydata.org/pandas-docs/stable/timedeltas.html#attributes
    • Timedeltas are differences in times, expressed in difference units, e.g. days, hours, minutes, seconds. They can be both positive and negative.
    • Timedelta is a subclass of datetime.timedelta, and behaves in a similar manner, but allows compatibility with np.timedelta64 types as well as a host of custom representation, parsing, and attributes.
    • You can access various components of the Timedelta or TimedeltaIndex directly using the attributes days,seconds,microseconds,nanoseconds. These are identical to the values returned by datetime.timedelta, in that, for example, the .seconds attribute represents the number of seconds >= 0 and < 1 day. These are signed according to whether the Timedelta is signed.
  • pandas.Timedelta.total_seconds — pandas 0.23.4 documentation
    • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Timedelta.total_seconds.html
    • Total duration of timedelta in seconds (to ns precision)
  • Time Series / Date functionality — pandas 0.23.4 documentation
    • http://pandas.pydata.org/pandas-docs/stable/timeseries.html#
  • python - Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes - Stack Overflow
    • https://stackoverflow.com/questions/22923775/calculate-pandas-dataframe-time-difference-between-two-columns-in-hours-and-minu
    • .total_seconds()
  • pandas.Series.dt.second — pandas 0.23.4 documentation
    • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.dt.second.html
    • The seconds of the datetime

How to calculate average value in the last minutes ?

  • df.set_index(pd.DatetimeIndex(df['timestamp']), inplace=True)
  • df['average'] = df_sub_speed['num'].rolling('5min').mean()
  • df.reset_index(drop=True, inplace=True)
  • pandas.DatetimeIndex — pandas 0.23.4 documentation
    • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DatetimeIndex.html
  • pandas.DataFrame.set_index — pandas 0.23.4 documentation
    • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.set_index.html
    • DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)
    • Set the DataFrame index (row labels) using one or more existing columns. By default yields a new object.
  • pandas.DataFrame.rolling — pandas 0.23.4 documentation
    • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html
    • DataFrame.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None)
    • Provides rolling window calculations.
    • window : int, or offset
      • Size of the moving window. This is the number of observations used for calculating the statistic. Each window will be a fixed size.
      • If its an offset then this will be the time period of each window. Each window will be a variable sized based on the observations included in the time-period. This is only valid for datetimelike indexes. This is new in 0.19.0
  • pandas.Series.mean — pandas 0.23.4 documentation
    • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.mean.html#pandas-series-mean
    • Series.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
    • Return the mean of the values for the requested axis
  • python - Pandas Set DatetimeIndex - Stack Overflow
    • https://stackoverflow.com/questions/17328655/pandas-set-datetimeindex

How to read / write file with dataframe ?

  • pandas.read_csv — pandas 0.23.4 documentation

    • http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html#pandas.read_csv
    • Read CSV (comma-separated) file into DataFrame
    • pandas.read_csv(filepath_or_buffersep=''delimiter=Noneheader='infer'names=Noneindex_col=Noneusecols=Nonesqueeze=Falseprefix=Nonemangle_dupe_cols=Truedtype=Noneengine=Noneconverters=Nonetrue_values=Nonefalse_values=Noneskipinitialspace=Falseskiprows=Nonenrows=Nonena_values=Nonekeep_default_na=Truena_filter=Trueverbose=Falseskip_blank_lines=Trueparse_dates=Falseinfer_datetime_format=Falsekeep_date_col=Falsedate_parser=Nonedayfirst=Falseiterator=Falsechunksize=Nonecompression='infer'thousands=Nonedecimal=b'.'lineterminator=Nonequotechar='"'quoting=0escapechar=Nonecomment=Noneencoding=Nonedialect=Nonetupleize_cols=Noneerror_bad_lines=Truewarn_bad_lines=Trueskipfooter=0doublequote=Truedelim_whitespace=Falselow_memory=Truememory_map=Falsefloat_precision=None)
  • pandas.DataFrame.to_csv — pandas 0.23.4 documentation
    • http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html?highlight=to_csv#pandas.DataFrame.to_csv
    • Write DataFrame to a comma-separated values (csv) file
    • DataFrame.to_csv(path_or_buf=Nonesep=''na_rep=''float_format=Nonecolumns=Noneheader=Trueindex=Trueindex_label=Nonemode='w'encoding=Nonecompression=Nonequoting=Nonequotechar='"'line_terminator='\n'chunksize=Nonetupleize_cols=Nonedate_format=Nonedoublequote=Trueescapechar=Nonedecimal='.')
  • How to add pandas data to an existing csv file? - Stack Overflow
    • https://stackoverflow.com/questions/17530542/how-to-add-pandas-data-to-an-existing-csv-file
    • with open(filename, 'a') as f:
    • df.to_csv(f, header=False)
  • How to stop appending a blank line in csv ?
    • pandas.DataFrame.to_csv( line_terminator='\n' )
    • pandas.DataFrame.to_csv — pandas 0.24.2 documentation
      • http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html
      • line_terminator : string, optional
      • The newline character or character sequence to use in the output file. Defaults to os.linesep, which depends on the OS in which this method is called (‘n’ for linux, ‘rn’ for Windows, i.e.).

      • Changed in version 0.24.0.


How to fix AssertionError: Number of manager items must equal union of block items ?

  • It is caused by duplicated columns names in one dataframe, find it out and remove the duplicates.
  • Pandas Python: Concatenate dataframes having same columns - Stack Overflow
    • https://stackoverflow.com/questions/52204115/pandas-python-concatenate-dataframes-having-same-columns

How to fix FutureWarning Passing list-likes to .loc or [] with any missing label will raise KeyError in the future, you can use .reindex() as an alternative ?

  • a = df.loc[ new_index ]
  • change loc[] to reindex(). a = df.reindex( new_index )
  • Indexing and Selecting Data — pandas 0.23.4 documentation
    • http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-with-list-with-missing-labels-is-deprecated
  • pandas.DataFrame.reindex — pandas 0.23.4 documentation
    • http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.reindex.html?highlight=reindex#pandas-dataframe-reindex
    • DataFrame.reindex(labels=None, index=None, columns=None, axis=None, method=None, copy=True, level=None, fill_value=nan, limit=None, tolerance=None)

How to fix SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead ?

  • a = df.loc[ new_index]
  • a[ 'col1' ] = a[ 'col2' ]
  • change a = df.loc[ new_index] to a = df.loc[ new_index].copy()
  • Indexing and Selecting Data — pandas 0.23.4 documentation
    • http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  • python - How to deal with SettingWithCopyWarning in Pandas? - Stack Overflow
    • https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas
    • If you explicitly copy then no further warning will happen
  • .loc[...] = value returns SettingWithCopyWarning · Issue #17476 · pandas-dev/pandas · GitHub
    • https://github.com/pandas-dev/pandas/issues/17476
    • Pandas isn't 100% sure if you want to assign values to just your df_c slice, or have it propagate all the way back up to the original df. To avoid this when you first assign df_c make sure you tell pandas that it is its own data frame (and not a slice) by using .copy()

How to fix TypeError: invalid type promotion when plot scatter with timestamp ?

  • python - Pandas type error trying to plot - Stack Overflow

    • https://stackoverflow.com/questions/33676608/pandas-type-error-trying-to-plot
  • pandas.DataFrame.astype — pandas 0.22.0 documentation
    • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.astype.html

How to fix TypeError: 'instancemethod' object has no attribute '__getitem__' ?

  • a= df.reindex[ new_index ]
  • change [] to (). a= df.reindex( new_index )
  • You are using square brackets after an object that doesn't know what to do with the square brackets.

How to fix TypeError: type object argument after * must be an iterable, not itertools.imap ?

  • df.drop_duplicates(subset=['position_xy'], inplace=False)
  • cast column from type list to tuple
  • df['position_xy'] = df['position_xy'].apply(lambda x : tuple(x) if type(x) is list else x)
  • python - Pandas drop_duplicates - TypeError: type object argument after * must be a sequence, not map - Stack Overflow
    • https://stackoverflow.com/questions/37792999/pandas-drop-duplicates-typeerror-type-object-argument-after-must-be-a-seque
    • it's because the list type isn't hashable and that's messing up the duplicated logic. As a workaround you could cast to tuple.

How to fix ValueError: The truth value of a Series is ambiguous ?

  • python - Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all() - Stack Overflow

    • https://stackoverflow.com/questions/36921951/truth-value-of-a-series-is-ambiguous-use-a-empty-a-bool-a-item-a-any-o
    • The or and and python statements require truth-values. For pandas these are considered ambiguous so you should use "bitwise" | (or) or & (and) operations

How to fix ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', u'occurred at index 0') ?

  • Problem

    • cols = [c for c in df.columns if not c.startswith(('a'))]
    • df[cols] = df[cols].apply( lambda x : 0 if x < 1e-10 else x, axis=1 )
  • It is because list comprehesion is not applicable with Dataframe object.
  • Solution 1
    • for c in df.columns:
    • if not c.startswith(('a')):
    • df[c] = df[c].apply( lambda x : 0 if x < 1e-10 else x )
  • Solution 2
    • df1 = df[cols].copy()
    • df1[df1 < 1e-10] = 0
    • df[cols] = df1[cols].copy()
  • 5 ways to apply an IF condition in pandas DataFrame - Data to Fish
    • https://datatofish.com/if-condition-in-pandas-dataframe/
  • python - Applying a conditional statement to all value of a dataframe - Stack Overflow
    • https://stackoverflow.com/questions/43377868/applying-a-conditional-statement-to-all-value-of-a-dataframe
    • df[df<3]=0

How to fix ValueError: can not merge DataFrame with instance of type <class 'pandas.core.series.Series'> ?

  • df_mean = df.groupby('id').col.mean().rename('mean_col')
  • df_min = df.groupby('id').col.min().rename('min_col')
  • df_result = pd.concat([df_mean, df_min], axis=1).reset_index()
  • python - Merging two DataFrames - Stack Overflow
    • https://stackoverflow.com/questions/37968785/merging-two-dataframes
    • df1.merge(df2.to_frame(), left_on='id', right_index=True)
  • python - Combining two Series into a DataFrame in pandas - Stack Overflow
    • https://stackoverflow.com/questions/18062135/combining-two-series-into-a-dataframe-in-pandas
    • pd.concat([s1, s2], axis=1).reset_index()

学习笔记之pandas的更多相关文章

  1. 学习笔记之pandas Foundations | DataCamp

    pandas Foundations | DataCamp https://www.datacamp.com/courses/pandas-foundations Many real-world da ...

  2. 【Python学习笔记】Pandas库之DataFrame

    1 简介 DataFrame是Python中Pandas库中的一种数据结构,它类似excel,是一种二维表. 或许说它可能有点像matlab的矩阵,但是matlab的矩阵只能放数值型值(当然matla ...

  3. python之pandas学习笔记-初识pandas

    初识pandas python最擅长的就是数据处理,而pandas则是python用于数据分析的最常用工具之一,所以学python一定要学pandas库的使用. pandas为python提供了高性能 ...

  4. 吴裕雄--天生自然python学习笔记:pandas模块导入数据

    有时候,手工生成 Pandas 的 DataFrame 数据是件非常麻烦的事情,所以我们通 常会先把数据保存在 Excel 或数据库中,然后再把数据导入 Pandas . 另 一种情况是抓 取网页中成 ...

  5. 吴裕雄--天生自然python学习笔记:pandas模块删除 DataFrame 数据

    Pandas 通过 drop 函数删除 DataFrarne 数据,语法为: 例如,删除陈聪明(行标题)的成绩: import pandas as pd datas = [[65,92,78,83,7 ...

  6. 吴裕雄--天生自然python学习笔记:pandas模块DataFrame 数据的修改及排序

    import pandas as pd datas = [[65,92,78,83,70], [90,72,76,93,56], [81,85,91,89,77], [79,53,47,94,80]] ...

  7. 吴裕雄--天生自然python学习笔记:pandas模块用 dataframe.loc 通过行、列标题读取数据

    用 df.va lue s 读取数据的前提是必须知道学生及科目的位置,非常麻烦 . 而 df.loc 可直接通过行.列标题读取数据,使用起来更为方便 . 使用 df.loc 的语法为: 行标题或列标题 ...

  8. 吴裕雄--天生自然python学习笔记:pandas模块读取 Data Frame 数据

    读取行数据 读取一个列数据的语法为: 例如,读取所有学生自然科目的成绩 : import pandas as pd datas = [[65,92,78,83,70], [90,72,76,93,56 ...

  9. 吴裕雄--天生自然python学习笔记:pandas模块强大的数据处理套件

    用 Python 进行数据分析处理,其中最炫酷的就属 Pa ndas 套件了 . 比如,如果我 们通过 Requests 及 Beautifulsoup 来抓取网页中的表格数据 , 需要进行较复 杂的 ...

随机推荐

  1. How to do distributed locking

    How to do distributed locking 怎样做可靠的分布式锁,Redlock 真的可行么? 本文是对 Martin Kleppmann 的文章 How to do distribu ...

  2. 【HDOJ4109】【拓扑OR差分约束求关键路径】

    http://acm.hdu.edu.cn/showproblem.php?pid=4109 Instrction Arrangement Time Limit: 2000/1000 MS (Java ...

  3. 【杭电OJ3938】【离线+并查集】

    http://acm.hdu.edu.cn/showproblem.php?pid=3938 Portal Time Limit: 2000/1000 MS (Java/Others)    Memo ...

  4. Maven 整理总结(一)

    使用maven来,今天对maven的使用进行一下总结.总经过程中,参考到的资料,我会尽量列举在下面,如果有涉及侵权的问题,挺联系我,我立即改正. 孤傲苍狼博客 http://www.cnblogs.c ...

  5. tomcat localhost

    启动tomcat后,登录本地localhost时,被要求输入用户名和密码,自己也从没有设置过啊,上网查找,原因如下: 机器装的oracle,它自带的httpserver的端口是8080,同时,tomc ...

  6. Maven命令安装jar包到本地仓库

    https://blog.csdn.net/moxiong3212/article/details/78767480 当需要的jar包在中央仓库找不到或者是想把自己生成的jar包放到的Maven仓库中 ...

  7. <--------------------------构造方法------------------------------>

    1 构造方法 初始化阶段 给对象的属性进行赋值 构造方法 什么是构造方法 : 字面 方法构建时 就使用的方法 对象创建的时候就使用的方法 作用:对象的属性值初始化2 如何用构造方法 修饰符 构造方法名 ...

  8. apache geode 试用

    使用docker 运行,文档参考的官方的5 分钟学习文档 拉取镜像 docker pull apachegeode/geode 启动 docker run -it -p 10334:10334 -p ...

  9. nakadi-ui nakadi event broker 的可视化UI工具

    nakadi 是一款很不错的基于fafka 开发的event broker ,我们只需要使用http 请求就可以调用kafka 方便的发布订阅功能 环境准备 docker-compose 文件 ver ...

  10. uname command

    The command uname helps us in development special in scripts, see help of the uname uname --help Usa ...