aws上传文件、删除文件、图像识别
aws的上传、删除s3文件以及图像识别文字功能
准备工作
安装aws cli
根据自己的操作系统,下载相应的安装包安装。安装过程很简单,在此不再赘述。
在安装完成之后,运行以下两个命令来验证AWS CLI是否安装成功。参考以下示例,在MacOS上打开Terminal程序。如果是Windows系统,打开cmd。
- where aws / which aws 查看AWS CLI安装路径
- aws --version 查看AWS CLI版本
zonghan@MacBook-Pro ~ % aws --version
aws-cli/2.0.30 Python/3.7.4 Darwin/21.6.0 botocore/2.0.0dev34
zonghan@MacBook-Pro ~ % which aws
/usr/local/bin/aws
初始化配置AWS CLI
在使用AWS CLI前,可使用aws configure命令,完成初始化配置。
zonghan@MacBook-Pro ~ % aws configure
AWS Access Key ID [None]: AKIA3GRZL6WIQEXAMPLE
AWS Secret Access Key [None]: k+ci5r+hAcM3x61w1example
Default region name [None]: ap-east-1
Default output format [None]: json
AWS Access Key ID 及AWS Secret Access Key可在AWS管理控制台获取,AWS CLI将会使用此信息作为用户名、密码连接AWS服务。
点击AWS管理控制台右上角的用户名 --> 选择Security Credentials
- 点击Create New Access Key以创建一对Access Key ID 及Secret Access Key,并保存(且仅能在创建时保存)
- Default region name,用以指定要连接的AWS 区域代码。每个AWS区域对应的代码可通过 此链接查找。
- Default output format,用以指定命令行输出内容的格式,默认使用JSON作为所有输出的格式。也可以使用以下任一格式:
JSON(JavaScript Object Notation)
YAML: 仅在 AWS CLI v2 版本中可用
Text
Table
更多详细的配置请看该文章
s3存储桶开通
该电脑配置的认证用户在aws的s3上有权限访问一个s3的存储桶,这个一般都是管理员给你开通
图像识别文字功能开通
该电脑配置的认证用户在aws的Amazon Textract的权限,这个一般都是管理员给你开通
aws的sdk
import boto3
from botocore.exceptions import ClientError, BotoCoreError
安装上述boto3的模块,一般会同时安装botocore模块
上传文件
方法一
使用upload_file方法来上传文件
import logging
import boto3
from botocore.exceptions import ClientError
import os
def upload_file(file_path, bucket, file_name=None):
"""Upload a file to an S3 bucket
:param file_name: File to upload
:param bucket: Bucket to upload to
:param object_name: S3 object name. If not specified then file_name is used
:return: True if file was uploaded, else False
"""
# If S3 object_name was not specified, use file_name
if object_name is None:
object_name = os.path.basename(file_name)
# Upload the file
s3_client = boto3.client('s3')
# s3 = boto3.resource('s3')
try:
response = s3_client.upload_file(file_path, bucket, file_name)
# response = s3.Bucket(bucket).upload_file(file_name, object_name)
except ClientError as e:
logging.error(e)
return False
return True
方法二
使用PutObject来上传文件
import logging
import os
import boto3
from botocore.exceptions import ClientError, BotoCoreError
from django.conf import settings
from celery import shared_task
logger = logging.getLogger(__name__)
def upload_file_to_aws(file_path, bucket, file_name=None):
"""Upload a file to an S3 bucket
:param file_path: File to upload
:param file_name: S3 object name. If not specified then file_path is used
:return: True if file was uploaded, else False
"""
# If S3 object_name was not specified, use file_name
if file_name is None:
file_name = os.path.basename(file_path)
# Upload the file
s3 = boto3.resource('s3')
try:
with open(file_path, 'rb') as f:
data = f.read()
obj = s3.Object(bucket, file_name)
obj.put(
Body=data
)
except BotoCoreError as e:
logger.info(e)
return False
return True
删除文件
def delete_aws_file(file_name, bucket):
try:
s3_client = boto3.client("s3")
s3_client.delete_object(Bucket=bucket, Key=file_name)
except Exception as e:
logger.info(e)
图像识别文字
识别发票、账单这种key,value的形式
def get_labels_and_values(result, field):
if "LabelDetection" in field:
key = field.get("LabelDetection")["Text"]
value = field.get("ValueDetection")["Text"]
if key and value:
if key.endswith(":"):
key = key[:-1]
result.append({key: value})
def process_text_detection(bucket, document):
try:
client = boto3.client("textract", region_name="ap-south-1")
response = client.analyze_expense(
Document={"S3Object": {"Bucket": bucket, "Name": document}}
)
except Exception as e:
logger.info(e)
raise "An unknown error occurred on the aws service"
result = {}
for expense_doc in response["ExpenseDocuments"]:
for line_item_group in expense_doc["LineItemGroups"]:
for line_items in line_item_group["LineItems"]:
for expense_fields in line_items["LineItemExpenseFields"]:
get_labels_and_values(result, expense_fields)
for summary_field in expense_doc["SummaryFields"]:
get_labels_and_values(result, summary_field)
return result
def get_extract_info(bucket, document):
return process_text_detection(bucket, document)
单纯的识别文字
#Analyzes text in a document stored in an S3 bucket. Display polygon box around text and angled text
import boto3
import io
from io import BytesIO
import sys
import math
from PIL import Image, ImageDraw, ImageFont
def ShowBoundingBox(draw,box,width,height,boxColor):
left = width * box['Left']
top = height * box['Top']
draw.rectangle([left,top, left + (width * box['Width']), top +(height * box['Height'])],outline=boxColor)
def ShowSelectedElement(draw,box,width,height,boxColor):
left = width * box['Left']
top = height * box['Top']
draw.rectangle([left,top, left + (width * box['Width']), top +(height * box['Height'])],fill=boxColor)
# Displays information about a block returned by text detection and text analysis
def DisplayBlockInformation(block):
print('Id: {}'.format(block['Id']))
if 'Text' in block:
print(' Detected: ' + block['Text'])
print(' Type: ' + block['BlockType'])
if 'Confidence' in block:
print(' Confidence: ' + "{:.2f}".format(block['Confidence']) + "%")
if block['BlockType'] == 'CELL':
print(" Cell information")
print(" Column:" + str(block['ColumnIndex']))
print(" Row:" + str(block['RowIndex']))
print(" Column Span:" + str(block['ColumnSpan']))
print(" RowSpan:" + str(block['ColumnSpan']))
if 'Relationships' in block:
print(' Relationships: {}'.format(block['Relationships']))
print(' Geometry: ')
print(' Bounding Box: {}'.format(block['Geometry']['BoundingBox']))
print(' Polygon: {}'.format(block['Geometry']['Polygon']))
if block['BlockType'] == "KEY_VALUE_SET":
print (' Entity Type: ' + block['EntityTypes'][0])
if block['BlockType'] == 'SELECTION_ELEMENT':
print(' Selection element detected: ', end='')
if block['SelectionStatus'] =='SELECTED':
print('Selected')
else:
print('Not selected')
if 'Page' in block:
print('Page: ' + block['Page'])
print()
def process_text_analysis(bucket, document):
#Get the document from S3
s3_connection = boto3.resource('s3')
s3_object = s3_connection.Object(bucket,document)
s3_response = s3_object.get()
stream = io.BytesIO(s3_response['Body'].read())
image=Image.open(stream)
# Analyze the document
client = boto3.client('textract')
image_binary = stream.getvalue()
response = client.analyze_document(Document={'Bytes': image_binary},
FeatureTypes=["TABLES", "FORMS"])
### Alternatively, process using S3 object ###
#response = client.analyze_document(
# Document={'S3Object': {'Bucket': bucket, 'Name': document}},
# FeatureTypes=["TABLES", "FORMS"])
### To use a local file ###
# with open("pathToFile", 'rb') as img_file:
### To display image using PIL ###
# image = Image.open()
### Read bytes ###
# img_bytes = img_file.read()
# response = client.analyze_document(Document={'Bytes': img_bytes}, FeatureTypes=["TABLES", "FORMS"])
#Get the text blocks
blocks=response['Blocks']
width, height =image.size
draw = ImageDraw.Draw(image)
print ('Detected Document Text')
# Create image showing bounding box/polygon the detected lines/text
for block in blocks:
DisplayBlockInformation(block)
draw=ImageDraw.Draw(image)
if block['BlockType'] == "KEY_VALUE_SET":
if block['EntityTypes'][0] == "KEY":
ShowBoundingBox(draw, block['Geometry']['BoundingBox'],width,height,'red')
else:
ShowBoundingBox(draw, block['Geometry']['BoundingBox'],width,height,'green')
if block['BlockType'] == 'TABLE':
ShowBoundingBox(draw, block['Geometry']['BoundingBox'],width,height, 'blue')
if block['BlockType'] == 'CELL':
ShowBoundingBox(draw, block['Geometry']['BoundingBox'],width,height, 'yellow')
if block['BlockType'] == 'SELECTION_ELEMENT':
if block['SelectionStatus'] =='SELECTED':
ShowSelectedElement(draw, block['Geometry']['BoundingBox'],width,height, 'blue')
#uncomment to draw polygon for all Blocks
#points=[]
#for polygon in block['Geometry']['Polygon']:
# points.append((width * polygon['X'], height * polygon['Y']))
#draw.polygon((points), outline='blue')
# Display the image
image.show()
return len(blocks)
def main():
bucket = ''
document = ''
block_count=process_text_analysis(bucket,document)
print("Blocks detected: " + str(block_count))
if __name__ == "__main__":
main()
aws上传文件、删除文件、图像识别的更多相关文章
- github 上传或删除 文件 命令
git clone https://github.com/onionhacker/bananaproxy.git cd ~/../.. git config --global user.email & ...
- java 通过sftp服务器上传下载删除文件
最近做了一个sftp服务器文件下载的功能,mark一下: 首先是一个SftpClientUtil 类,封装了对sftp服务器文件上传.下载.删除的方法 import java.io.File; imp ...
- 通过代码链接ftp上传下载删除文件
因为我的项目是Maven项目,首先要导入一个Maven库里的包:pom.xml <dependency> <groupId>com.jcraft</ ...
- 七牛云-上传、删除文件,工具类(Day49)
要求: 1. java1.8以上 2. Maven: 这里的version指定了一个版本范围,每次更新pom.xml的时候会尝试去下载7.5.x版本中的最新版本,你可以手动指定一个固定的版本. < ...
- github上传和删除文件(三)
上传文件: git init git add * git commit -m "description" //git remote rm origin 或查看当前 git remo ...
- java FTP 上传下载删除文件
在JAVA程序中,经常需要和FTP打交道,比如向FTP服务器上传文件.下载文件,本文简单介绍如何利用jakarta commons中的FTPClient(在commons-net包中)实现上传下载文件 ...
- 使用eclipse-hadoop插件无法再eclipse操作(上传、删除文件)
再conf中的hdfs-site.xml添加如下配置: <property><name>dfs.permissions</name><value>fal ...
- FastDfs java客户端上传、删除文件
#配置文件 connect_timeout = 2 network_timeout = 30 charset = UTF-8 http.tracker_http_port = 9090 http.an ...
- Struts2 文件上传,下载,删除
本文介绍了: 1.基于表单的文件上传 2.Struts 2 的文件下载 3.Struts2.文件上传 4.使用FileInputStream FileOutputStream文件流来上传 5.使用Fi ...
随机推荐
- 5.30 NOI 模拟
$5.30\ NOI $模拟 高三大哥最后一次模拟考了,祝他们好运 \(T1\)装箱游戏 显然可以将四种字母之间的空缺当做状态枚举 那么这道题就很显然了 #include<bits/stdc++ ...
- BootStrap详解
1. bootstrap的安装和使用 官网: https://getbootstrap.com/ 中文网: https://www.bootcss.com/ 菜鸟驿站教程网: https://www. ...
- Maven 配置文件如何读取pom.xml的内容
编写配置文件 配置文件读取pom文件内容用@@的方式 logging: level: cn.sail: @logging.level@ org.springframework: warn config ...
- 【HTML】学习路径2-设置文档类型、网页编码、文件注释
第一章:设置文档类型 我们通常在html文件最前面写一行: <!DOCTYPE html> 这玩意有啥用? https://developer.mozilla.org/zh-CN/docs ...
- 一 策略模式 来自CBF4LIFE 的设计模式
刘备要到江东娶老婆了,走之前诸葛亮给赵云(伴郎)三个锦囊妙计,说是按天机拆开解决棘手问题,嘿,还别说,真是解决了大问题,搞到后是周瑜陪了夫人又折兵呀,那咱们先看看这个场景是什么样子的. 先说这个场景中 ...
- 【读书笔记】C#高级编程 第七章 运算符和类型强制转换
(一)运算符 类别 运算符 算术运算符 + - * / % 逻辑运算符 & | ^ ~ && || ! 字符串连接运算符 + 增量和减量运算符 ++ -- 移位运算符 < ...
- 通过IIS部署Flask项目
本文主要介绍在Windows Server 2012R2上通过IIS部署Flask项目的过程,以及对TTFB延迟大问题的思考.关于如何申请云服务器,注册(子)域名,备案,开放云服务器端口,获取SS ...
- JS中如何删除某个父元素下的所有子元素?
JS中如何删除某个父元素下的所有子元素?这里我介绍几种方法: 1.通过元素的 innerHTML 属性来删除 这种方式我觉得是最有方便的,直接找到你想要的父元素,直接令其 element.innerH ...
- 大规模数据分析统一引擎Spark最新版本3.3.0入门实战
@ 目录 概述 定义 Hadoop与Spark的关系与区别 特点与关键特性 组件 集群概述 集群术语 部署 概述 环境准备 Local模式 Standalone部署 Standalone模式 配置历史 ...
- 从源码中理解Spring Boot自动装配原理
个人博客:槿苏的知识铺 一.什么是自动装配 SpringBoot 定义了一套接口规范,这套规范规定:SpringBoot在启动时会扫描外部引用jar包中的META-INF/spring.factori ...