11 Python Libraries You Might Not Know

by Greg | January 20, 2015


There are tons of Python packages out there. So many that no one man or woman could possibly catch them all. PyPi alone has over 47,000 packages listed!

Recently, with so many data scientists making the switch to Python, I couldn't help but think that while they're getting some of the great benefits of pandasscikit-learn, and numpy, they're missing out on some older yet equally helpful Python libraries.

In this post, I'm going to highlight some lesser-known libraries. Even you experienced Pythonistas should take a look, there might be one or two in there you've never seen!

1) delorean

Delorean is a really cool date/time library. Apart from having a sweet name, it's one of the more natural feeling date/time munging libraries I've used in Python. It's sort of like moment in javascript, except I laugh every time I import it. The docs are also good and in addition to being technically helpful, they also make countless Back to the Future references.

from delorean import Delorean
EST = "US/Eastern"
d = Delorean(timezone=EST)

2) prettytable

There's a chance you haven't heard of prettytable because it's listed on GoogleCode, which is basically the coding equivalent of Siberia.

Despite being exiled to a cold, snowy and desolate place, prettytable is great for constructing output that looks good in the terminal or in the browser. So if you're working on a new plug-in for the IPython Notebook, check out prettytable for your HTML __repr__.

from prettytable import PrettyTable
table = PrettyTable(["animal", "ferocity"])
table.add_row(["wolverine", 100])
table.add_row(["grizzly", 87])
table.add_row(["Rabbit of Caerbannog", 110])
table.add_row(["cat", -1])
table.add_row(["platypus", 23])
table.add_row(["dolphin", 63])
table.add_row(["albatross", 44])
table.sort_key("ferocity")
table.reversesort = True
+----------------------+----------+
| animal | ferocity |
+----------------------+----------+
| Rabbit of Caerbannog | 110 |
| wolverine | 100 |
| grizzly | 87 |
| dolphin | 63 |
| albatross | 44 |
| platypus | 23 |
| cat | -1 |
+----------------------+----------+

3) snowballstemmer

Ok so the first time I installed snowballstemmer, it was because I thought the name was cool. But it's actually a pretty slick little library. snowballstemmer will stem words in 15 different languages and also comes with a porter stemmer to boot.

from snowballstemmer import EnglishStemmer, SpanishStemmer
EnglishStemmer().stemWord("Gregory")
# Gregori
SpanishStemmer().stemWord("amarillo")
# amarill

4) wget

Remember every time you wrote that web crawler for some specific purpose? Turns out somebody built it...and it's called wget. Recursively download a website? Grab every image from a page? Sidestep cookie traces? Done, done, and done.

Movie Mark Zuckerberg even says it himself

First up is Kirkland, they keep everything open and allow indexes on their apache configuration, so a little wget magic is enough to download the entire Kirkland facebook. Kid stuff!

The Python version comes with just about every feature you could ask for and is easy to use.

import wget
wget.download("http://www.cnn.com/")
# 100% [............................................................................] 280385 / 280385

Note that another option for linux and osx users would be to use do: from sh import wget. However the Python wget module does have a better argument handline.

5) PyMC

I'm not sure how PyMC gets left out of the mix so often. scikit-learn seems to be everyone's darling (as it should, it's fantastic), but in my opinion, not enough love is given to PyMC.

from pymc.examples import disaster_model
from pymc import MCMC
M = MCMC(disaster_model)
M.sample(iter=10000, burn=1000, thin=10)
[-----------------100%-----------------] 10000 of 10000 complete in 1.4 sec

If you don't already know it, PyMC is a library for doing Bayesian analysis. It's featured heavily in Cam Davidson-Pilon's Bayesian Methods for Hackers and has made cameos on a lot of popular data science/python blogs, but has never received the cult following akin to scikit-learn.

6) sh

I can't risk you leaving this page and not knowing about shsh lets you import shell commands into Python as functions. It's super useful for doing things that are easy in bash but you can't remember how to do in Python (i.e. recursively searching for files).

from sh import find
find("/tmp")
/tmp/foo
/tmp/foo/file1.json
/tmp/foo/file2.json
/tmp/foo/file3.json
/tmp/foo/bar/file3.json

7) fuzzywuzzy

Ranking in the top 10 of simplest libraries I've ever used (if you have 2-3 minutes, you can read through the source), fuzzywuzzy is a fuzzy string matching library built by the fine people at SeatGeek.

fuzzywuzzy implements things like string comparison ratios, token ratios, and plenty of other matching metrics. It's great for creating feature vectors or matching up records in different databases.

from fuzzywuzzy import fuzz
fuzz.ratio("Hit me with your best shot", "Hit me with your pet shark")
# 85

8) progressbar

You know those scripts you have where you do a print "still going..." in that giant mess of a for loop you call your __main__? Yeah well instead of doing that, why don't you step up your game and start using progressbar?

progressbar does pretty much exactly what you think it does...makes progress bars. And while this isn't exactly a data science specific activity, it does put a nice touch on those extra long running scripts.

Alas, as another GoogleCode outcast, it's not getting much love (the docs have 2 spaces for indents...2!!!). Do what's right and give it a good ole pip install.

from progressbar import ProgressBar
import time
pbar = ProgressBar(maxval=10)
for i in range(1, 11):
pbar.update(i)
time.sleep(1)
pbar.finish()
# 60% |######################################################## |

9) colorama

So while you're making your logs have nice progress bars, why not also make them colorful! It can actually be helpful for reminding yourself when things are going horribly wrong.

colorama is super easy to use. Just pop it into your scripts and add any text you want to print to a color:

10) uuid

I'm of the mind that there are really only a few tools one needs in programming: hashing, key/value stores, and universally unique ids. uuid is the built in Python UUID library. It implements versions 1, 3, 4, and 5 of the UUID standards and is really handy for doing things like...err...ensuring uniqueness.

That might sound silly, but how many times have you had records for a marketing campaign, or an e-mail drop and you want to make sure everyone gets their own promo code or id number?

And if you're worried about running out of ids, then fear not! The number of UUIDs you can generate is comparable to the number of atoms in the universe.

import uuid
print uuid.uuid4()
# e7bafa3d-274e-4b0a-b9cc-d898957b4b61

Well if you were a uuid you probably would be.

11) bashplotlib

Shameless self-promotion here, bashplotlib is one of my creations. It lets you plot histograms and scatterplots using stdin. So while you might not find it replacing ggplot or matplotlib as your everyday plotting library, the novelty value is quite high. At the very least, use it as a way to spruce up your logs a bit.

$ pip install bashplotlib
$ scatter --file data/texas.txt --pch x

11 Python Libraries You Might Not Know的更多相关文章

  1. 1.3 Essential Python Libraries(一些重要的Python库)

    1.3 Essential Python Libraries(一些重要的Python库) 如果不了解Python的数据生态,以及本书中即将用到的一些库,这里会做一个简单的介绍: Numpy 这里就不过 ...

  2. macosx 10.11 python pip install 出现错误OSError: [Errno 1] Operation not permitted:

    Exception: Traceback (most recent call last): File , in main status = self.run(options, args) File , ...

  3. 11.python之线程,协程,进程,

    一,进程与线程 1.什么是线程 线程是操作系统能够进行运算调度的最小单位.它被包含在进程之中,是进程中的实际运作单位.一条线程指的是进程中一个单一顺序的控制流,一个进程中可以并发多个线程,每条线程并行 ...

  4. #11 Python字典

    前言 前两节介绍了Python列表和字符串的相关用法,这两种数据类型都是有序的数据类型,所以它们可以通过索引来访问内部元素.本文将记录一种无序的数据类型——字典! 一.字典与列表和字符串的区别 字典是 ...

  5. 11 Python 文件操作

    文件操作基本流程 计算机系统分为:计算机硬件,操作系统,应用程序三部分. 我们用python或其他语言编写的应用程序若想要把数据永久保存下来,必须要保存于硬盘中,这就涉及到应用程序要操作硬件,众所周知 ...

  6. 11.python中的元组

    在学习什么是元组之前,我们先来看看如何创建一个元组对象: a = ('abc',123) b = tuple(('def',456)) print a print b

  7. 11 python初学 (文件)

    对文件的操作分为 3 步: 打开文件: f = open('望月怀古', 'r', encoding='utf8') # 路径可以写绝对路径,也可以写相对路径: 操作 关闭文件: f.close() ...

  8. 11.python描述符---类的装饰器---@property

    描述符1.描述符是什么:描述符本质就是一个新式类,在这个新式类中,至少实现了__get__(),__set__(),__delete__()这三个内置方法中的一个,描述符也被称为描述符协议(1):__ ...

  9. 11: python递归

    1.1 递归讲解 1.定义 1. 在函数内部,可以调用其他函数.如果一个函数在内部调用自身本身,这个函数就是递归函数. 2.递归特性 1. 必须有一个明确的结束条件 2. 每次进入更深一层递归时,问题 ...

随机推荐

  1. mysql 学习之1 mysql在window系统下的安装

    转载: https://blog.csdn.net/weixin_43295278/article/details/8287440 此方法只 适用 于window系统 坑 此篇文章在使用 alter ...

  2. H5新增的postMessage跨域解决方案Demo

    Demo背景:html中使用iframe嵌入了跨域的vue项目,在html中将参数传入到跨越的vue项目中. 向跨越的子窗口中发送数据 function sendMessage(data) { // ...

  3. jQuery的两把利器

    1 jQuery核心函数 * 简称: jQuery函数($/jQuery) * jQuery库向外直接暴露的就是$/jQuery * 引入jQuery库后, 直接使用$即可 * 当函数用: $(xxx ...

  4. 天猫精灵业务如何使用机器学习PAI进行模型推理优化

    引言 天猫精灵(TmallGenie)是阿里巴巴人工智能实验室(Alibaba A.I.Labs)于2017年7月5日发布的AI智能语音终端设备.天猫精灵目前是全球销量第三.中国销量第一的智能音箱品牌 ...

  5. QueryList采集页面链接及对应标题

    <?php header('content-type:text/html;charset=utf-8'); require 'vendor/autoload.php'; use QL\Query ...

  6. JavaWeb学习篇之----容器Request详解

    前篇说到了Response容器对象,这篇我们就来看一下Request容器对象,之前也说过了,这个两个容器对象是相对应的,每次用户请求服务器的时候web容器就会给创建这对容器对象,他们是共存亡的,当然R ...

  7. bzoj1042题解

    [题意分析] 有q个询问,每次询问求一个只有四个物品的背包方案数. [解题思路] 先计算出所有面值硬币的无限背包方案数. 考虑容斥,每个限制表示选取大于di个面值ci的硬币,总方案数=0限制方案数-1 ...

  8. 10、 导出python脚本进行数据驱动的接口测试

    postman自带脚本导出功能,对于代码小白来说,可以不错的学习代码级接口测试 第一步:输入接口地址,点击send 第二步:点击code,导出脚本文件,为python脚本 第三步:安装python3以 ...

  9. 大神给你分析HTTPS和HTTP的区别

    今天在做雅虎的时候,发现用第三方工具截取不到客户端与服务端的通讯,以前重来没碰到过这种情况,仔细看了看,它的url请求时基于https的,gg了下发现原来https协议和http有着很大的区别.总的来 ...

  10. spark自定义函数之——UDAF使用详解及代码示例

    UDAF简介 UDAF(User Defined Aggregate Function)即用户定义的聚合函数,聚合函数和普通函数的区别是什么呢,普通函数是接受一行输入产生一个输出,聚合函数是接受一组( ...