Google App Engine: How to write large files to Google Cloud Storage(Google App Engine:如何将大文件写入 Google Cloud Storage)
问题描述
I am trying to save large files from Google App Engine's Blobstore to Google Cloud Storage to facilitate backup.
It works fine for small files (<10 mb) but for larger files it get gets unstable and GAE throws and FileNotOpenedError.
My code:
PATH = '/gs/backupbucket/'
for df in DocumentFile.all():
fn = df.blob.filename
br = blobstore.BlobReader(df.blob)
write_path = files.gs.create(self.PATH+fn.encode('utf-8'), mime_type='application/zip',acl='project-private')
with files.open(write_path, 'a') as fp:
while True:
buf = br.read(100000)
if buf=="": break
fp.write(buf)
files.finalize(write_path)
(Runs in a taskeque to avoid exceeding execution time).
Throws a FileNotOpenedError:
Traceback (most recent call last):
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1511, in __call__
rv = self.handle_exception(request, response, e)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__
rv = self.router.dispatch(request, response)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher
return route.handler_adapter(request, response)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__
return handler.dispatch()
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 547, in dispatch
return self.handle_exception(e, self.app.debug)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch
return method(*args, **kwargs)
File "/base/data/home/apps/s~simplerepository/1.354754771592783168/processFiles.py", line 249, in post
fp.write(buf)
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 281, in __exit__
self.close()
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 275, in close
self._make_rpc_call_with_retry('Close', request, response)
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 388, in _make_rpc_call_with_retry
_make_call(method, request, response)
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 236, in _make_call
_raise_app_error(e)
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 179, in _raise_app_error
raise FileNotOpenedError()
I have investigated further and according to a comment to GAE Issue 5371 the Files API closes the file every 30 seconds. I have not seen this documented anywhere else.
I have tried to work around this by closing and opening the file at intervals but now I get an WrongOpenModeError. The code below is edited from the first version of this post I have added a 0.5 second pause between the close and the open of the file. It now throws a WrongOpenModeError.
My code (updated):
PATH = '/gs/backupbucket/'
for df in DocumentFile.all():
fn = df.blob.filename
br = blobstore.BlobReader(df.blob)
write_path = files.gs.create(self.PATH+fn.encode('utf-8'), mime_type='application/zip',acl='project-private')
fp = files.open(write_path, 'a')
c = 0
while True:
if (c == 5):
c = 0
fp.close()
files.finalize(write_path)
time.sleep(0.5)
fp = files.open(write_path, 'a')
c = c + 1
buf = br.read(100000)
if buf=="": break
fp.write(buf)
files.finalize(write_path)
Stacktrace:
Traceback (most recent call last):
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1511, in __call__
rv = self.handle_exception(request, response, e)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__
rv = self.router.dispatch(request, response)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher
return route.handler_adapter(request, response)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__
return handler.dispatch()
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 547, in dispatch
return self.handle_exception(e, self.app.debug)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch
return method(*args, **kwargs)
File "/base/data/home/apps/s~simplerepository/1.354894420907462278/processFiles.py", line 267, in get
fp.write(buf)
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 310, in write
self._make_rpc_call_with_retry('Append', request, response)
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 388, in _make_rpc_call_with_retry
_make_call(method, request, response)
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 236, in _make_call
_raise_app_error(e)
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 188, in _raise_app_error
raise WrongOpenModeError()
I have tried to find information about the WrongOpenModeError but the only place it is mentioned is in the appengine.api.files.file.py itself.
Suggestions on how to get around this and be able to save also large files to Google Cloud storage would be greatly appreciated. Thanks!
I was having the same issue, endup writing an iterator around fetch data and catch the exception, works but is a work-around.
Re-writing your code would be something like:
from google.appengine.ext import blobstore
from google.appengine.api import files
def iter_blobstore(blob, fetch_size=524288):
start_index = 0
end_index = fetch_size
while True:
read = blobstore.fetch_data(blob, start_index, end_index)
if read == "":
break
start_index += fetch_size
end_index += fetch_size
yield read
PATH = '/gs/backupbucket/'
for df in DocumentFile.all():
fn = df.blob.filename
br = blobstore.BlobReader(df.blob)
write_path = files.gs.create(self.PATH+fn.encode('utf-8'), mime_type='application/zip',acl='project-private')
with files.open(write_path, 'a') as fp:
for buf in iter_blobstore(df.blob):
try:
fp.write(buf)
except files.FileNotOpenedError:
pass
files.finalize(write_path)
这篇关于Google App Engine:如何将大文件写入 Google Cloud Storage的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:Google App Engine:如何将大文件写入 Google Cloud Storage
基础教程推荐
- Python,确定字符串是否应转换为 Int 或 Float 2022-01-01
- 在 Python 中将货币解析为数字 2022-01-01
- 对多索引数据帧的列进行排序 2022-01-01
- matplotlib 设置 yaxis 标签大小 2022-01-01
- 在 Django Admin 中使用内联 OneToOneField 2022-01-01
- kivy 应用程序中的一个简单网页作为小部件 2022-01-01
- 比较两个文本文件以找出差异并将它们输出到新的文本文件 2022-01-01
- Kivy 使用 opencv.调整图像大小 2022-01-01
- 究竟什么是“容器"?在蟒蛇?(以及所有的 python 容器类型是什么?) 2022-01-01
- Python 中是否有任何支持将长字符串转储为块文字或折叠块的 yaml 库? 2022-01-01
