参数说明:

  • coordinates:Represents the geographic location of this Tweet as reported by the user or client application. The inner coordinates array is formatted as geoJSON (longitude first, then latitude).

1.文本文件转 json 格式

  读取 txt 文件中的 tweets 文本,将其转为 json 格式,可以打印输出,也可以提取详细信息

代码:

import json
import os folderpath = r"D:\Twitter Data\Data\test"
files = os.listdir(folderpath)
os.chdir(folderpath) # get the first txt file
tweets_data_path = files[0] # store json format file in this array
tweets_data = []
tweets_file = open(tweets_data_path, "r")
for line in tweets_file:
try:
tweet = json.loads(line)
tweets_data.append(tweet)
except:
continue
# print json format file with indentation
print(json.dumps(tweets_data[0], indent=4))

输出:

{
"created_at": "Tue Jun 25 20:44:34 +0000 2019",
"id": 1143621025550049280,
"id_str": "1143621025550049280",
"text": "Australia beat the Poms overnight \ud83d\ude01\ud83c\udfcf\ud83c\udde6\ud83c\uddfa\ud83c\udff4\udb40\udc67\udb40\udc62\udb40\udc65\udb40\udc6e\udb40\udc67\udb40\udc7f #AUSvENG #CmonAussie #CWC19",
"source": "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>",
"truncated": false,
"in_reply_to_status_id": null,
"in_reply_to_status_id_str": null,
"in_reply_to_user_id": null,
"in_reply_to_user_id_str": null,
"in_reply_to_screen_name": null,
"user": {
"id": 252426781,
"id_str": "252426781",
"name": "Willy Aitch",
"screen_name": "WillyAitch",
"location": "Melbourne, Victoria",
"url": null,
"description": "September 2017 to February 2018, was the greatest 5 months ever. Richmond \ud83d\udc2f\ud83d\udc2f\ud83d\udc2fwon the 2017 AFL Premiership! Philadelphia Eagles \ud83e\udd85\ud83e\udd85\ud83e\udd85 won Super Bowl LII",
"translator_type": "none",
"protected": false,
"verified": false,
"followers_count": 417,
"friends_count": 1061,
"listed_count": 15,
"favourites_count": 18852,
"statuses_count": 17796,
"created_at": "Tue Feb 15 04:55:59 +0000 2011",
"utc_offset": null,
"time_zone": null,
"geo_enabled": true,
"lang": null,
"contributors_enabled": false,
"is_translator": false,
"profile_background_color": "C0DEED",
"profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png",
"profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png",
"profile_background_tile": false,
"profile_link_color": "1DA1F2",
"profile_sidebar_border_color": "C0DEED",
"profile_sidebar_fill_color": "DDEEF6",
"profile_text_color": "333333",
"profile_use_background_image": true,
"profile_image_url": "http://pbs.twimg.com/profile_images/1112669591342211072/rnbV0dCK_normal.jpg",
"profile_image_url_https": "https://pbs.twimg.com/profile_images/1112669591342211072/rnbV0dCK_normal.jpg",
"profile_banner_url": "https://pbs.twimg.com/profile_banners/252426781/1522377977",
"default_profile": true,
"default_profile_image": false,
"following": null,
"follow_request_sent": null,
"notifications": null
},
"geo": null,
"coordinates": null,
"place": {
"id": "01864a8a64df9dc4",
"url": "https://api.twitter.com/1.1/geo/id/01864a8a64df9dc4.json",
"place_type": "city",
"name": "Melbourne",
"full_name": "Melbourne, Victoria",
"country_code": "AU",
"country": "Australia",
"bounding_box": {
"type": "Polygon",
"coordinates": [
[
[
144.593742,
-38.433859
],
[
144.593742,
-37.511274
],
[
145.512529,
-37.511274
],
[
145.512529,
-38.433859
]
]
]
},
"attributes": {}
},
"contributors": null,
"is_quote_status": false,
"quote_count": 0,
"reply_count": 0,
"retweet_count": 0,
"favorite_count": 0,
"entities": {
"hashtags": [
{
"text": "AUSvENG",
"indices": [
46,
54
]
},
{
"text": "CmonAussie",
"indices": [
55,
66
]
},
{
"text": "CWC19",
"indices": [
67,
73
]
}
],
"urls": [],
"user_mentions": [],
"symbols": []
},
"favorited": false,
"retweeted": false,
"filter_level": "low",
"lang": "en",
"timestamp_ms": "1561495474599"
}

2. 读取关键字内容

  通过 .keys() 获取所有的键值

代码:

import json
import os folderpath = r"D:\Twitter Data\Data\test"
files = os.listdir(folderpath)
os.chdir(folderpath) # get the first txt file
tweets_data_path = files[0] # store json format file in this array
tweets_data = []
tweets_file = open(tweets_data_path, "r")
for line in tweets_file:
try:
tweet = json.loads(line)
tweets_data.append(tweet)
except:
continue for k in tweets_data[0].keys():
print(k)

输出:

created_at
id
id_str
text
source
truncated
in_reply_to_status_id
in_reply_to_status_id_str
in_reply_to_user_id
in_reply_to_user_id_str
in_reply_to_screen_name
user
geo
coordinates
place
contributors
is_quote_status
quote_count
reply_count
retweet_count
favorite_count
entities
favorited
retweeted
filter_level
lang
timestamp_ms

3. 输出键值信息

代码:

import json
import os folderpath = r"D:\Twitter Data\Data\test"
files = os.listdir(folderpath)
os.chdir(folderpath) # get the first txt file
tweets_data_path = files[0] # store json format file in this array
tweets_data = []
tweets_file = open(tweets_data_path, "r")
for line in tweets_file:
try:
tweet = json.loads(line)
tweets_data.append(tweet)
except:
continue for k in tweets_data[0].keys():
print(k, ":", tweets_data[0][k])
print()

输出:

created_at : Tue Jun 25 20:44:34 +0000 2019

id : 1143621025550049280

id_str : 1143621025550049280

text : Australia beat the Poms overnight 												

【439】Tweets processing by Python的更多相关文章

  1. 【转】实习小记-python 内置函数__eq__函数引发的探索

    [转]实习小记-python 内置函数__eq__函数引发的探索 乱写__eq__会发生啥?请看代码.. >>> class A: ... def __eq__(self, othe ...

  2. 【转】实习小记-python中可哈希对象是个啥?what is hashable object in python?

    [转]实习小记-python中可哈希对象是个啥?what is hashable object in python? 废话不多说直接祭上python3.3x的文档:(原文链接) object.__ha ...

  3. 【6】TensorFlow光速入门-python模型转换为tfjs模型并使用

    本文地址:https://www.cnblogs.com/tujia/p/13862365.html 系列文章: [0]TensorFlow光速入门-序 [1]TensorFlow光速入门-tenso ...

  4. 【Linux】CentOS下升级Python和Pip版本全自动化py脚本

    [Linux]CentOS下升级Python和Pip版本全自动化py脚本 CentOS7.6自带py2.7和py3.6 想要安装其它版本的话就要自己重新下载和编译py其它版本并且配置环境,主要是软链接 ...

  5. 【原】使用Json作为Python和C#混合编程时对象转换的中间文件

    一.Python中自定义类对象json字符串化的步骤[1]   1. 用 json 或者simplejson 就可以: 2.定义转换函数: 3. 定义类 4. 生成对象 5.dumps执行,引入转换函 ...

  6. 【解决方案】django初始化执行python manage.py migrate命令后,除default数据库之外的其他数据库中的表没有创建出来

    [问题原因]:django工程中存在多个应用,每个应用都指定了对应的数据库.执行python manage.py migrate命令时没有指定数据库,将只初始化默认的default数据库. [解决方案 ...

  7. 【转】利用Psyco提升Python运行速度

    转自:http://www.leeon.me/a/use-Psyco-to-improve-Python-speed Psyco 是严格地在 Python 运行时进行操作的.也就是说,Python 源 ...

  8. 【转】你真的理解Python中MRO算法吗?

    你真的理解Python中MRO算法吗? MRO(Method Resolution Order):方法解析顺序. Python语言包含了很多优秀的特性,其中多重继承就是其中之一,但是多重继承会引发很多 ...

  9. 【算法】八皇后问题 Python实现

    [八皇后问题] 问题: 国际象棋棋盘是8 * 8的方格,每个方格里放一个棋子.皇后这种棋子可以攻击同一行或者同一列或者斜线(左上左下右上右下四个方向)上的棋子.在一个棋盘上如果要放八个皇后,使得她们互 ...

随机推荐

  1. python数据分析之数据分布

    转自链接:https://blog.csdn.net/YEPAO01/article/details/99197487 一.查看数据分布趋势 import pandas as pd import nu ...

  2. wait,waitpid学习测试

    用man wait学习wait waitpid的使用 wait()函数功能:wait()函数使父进程暂停执行,直到它的一个子进程结束为止,该函数的返回值是终止运行的子进程的PID. 参数status所 ...

  3. Kubernetes 学习20调度器,预选策略及优选函数

    一.概述 1.k8s集群中能运行pod资源的其实就是我们所谓的节点,也称为工作节点.master从本质上来讲,他其实是运行整个集群的控制平面组件的比如apiserver,scheal,controlm ...

  4. include和taglib指令

    1.include指令用来包含另一个静态文件,这个静态文件可以是一个JSP页面.一个Servlet.文本文件.JSP代码. include.jsp <%@ page contentType=&q ...

  5. 【FTP】Wireshark学习FTP流程

    一.Wireshark概述 在windows下, 图1 Wireshark界面展示(基于1.99.1) Wireshark是通过底层的winpcap来实现抓包的.winpcap是用于网络封包抓取的一套 ...

  6. Nginx 安装配置【必须把文件到放到机器上】

    [必须把所有下载的gz文件到放到机器上:编译] 1.安装nginx之前的编译软件 yum -y install make zlib zlib-devel gcc-c++ libtool  openss ...

  7. 【数论】P1029 最大公约数和最小公倍数问题

    题目链接 P1029 最大公约数和最小公倍数问题 思路 如果有两个数a和b,他们的gcd(a,b)和lcm(a,b)的乘积就等于ab. 也就是: ab=gcd(a,b)*lcm(a,b) 那么,接下来 ...

  8. 【概率论】5-6:正态分布(The Normal Distributions Part III)

    title: [概率论]5-6:正态分布(The Normal Distributions Part III) categories: - Mathematic - Probability keywo ...

  9. 超炫酷的 Docker 终端 UI lazydocker

    A simple terminal UI for both docker and docker-compose, written in Go with the gocui library. https ...

  10. 代码还原,IDA中使用的宏

    在IDA7.0中的定义文件拷贝的. 如果想使用,直接去IDA的plugins插件目录下.包含它的 **defs.h"" 如下: /* This file contains defi ...