如何将属性添加到作为组存储在HDF5文件中的 pa-Python实例代码

How to add attributes to a pandas dataframe that is stored as a group in a HDF5 file?(如何将属性添加到作为组存储在HDF5文件中的 pandas 数据帧？)

本文介绍了如何将属性添加到作为组存储在HDF5文件中的 pandas 数据帧？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个多维 pandas 数据帧，如下所示：

import numpy as np
import pandas as pd
iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']]
mindex = pd.MultiIndex.from_product(iterables, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(8, 4), index=mindex)
store = pd.HDFStore("df.h5")
store["df"] = df
store.close()

我想将属性添加到存储在HDFStore中的df。我怎么能这样做呢？似乎没有任何关于属性的documentation，并且用于存储df的组与h5py模块中的HDF5组的类型不同：

type(list(store.groups())[0])
Out[24]: tables.group.Group

似乎是pyables组，只有这个私有成员函数涉及其他类型的属性：

__setattr__(self, name, value)
 |      Set a Python attribute called name with the given value.

我想要的是简单地存储一组带有多维索引的DataFrame，这些多维索引由属性以结构化的方式"标记"，以便我可以比较它们，并根据这些属性对它们进行子选。

HDF5的基本用途+ pandas 的Multidim DataFrame。

有this one这样的问题，涉及使用除 pandas 以外的其他阅读器阅读HDF5文件，但它们都具有一维索引的DataFrame，这使得简单地转储麻木的ndarray和额外存储索引变得容易。

推荐答案

我到目前为止还没有得到任何答案，这是我使用pandas和h5py模块设法做到的：pandas用于存储和读取多维DataFrame，h5py用于存储和读取HDF5组的属性：

import numpy as np
import pandas as pd
import h5py

# Create a random multidim DataFrame
iterables = [['bar', 'baz', 'foo', 'qux'], ['one', 'two']]
mindex = pd.MultiIndex.from_product(iterables, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(8, 4), index=mindex)

pdStore = pd.HDFStore("df.h5")
h5pyFile = h5py.File("df.h5")

# Dumping the data and storing the attributes
pdStore["df"] = df
h5pyFile["/df"].attrs["number"] = 1

# Reading the data conditionally based on stored attributes.
dfg = h5pyFile["/df"]
readDf = pd.DataFrame()
if dfg.attrs["number"] == 1:
    readDf = pdStore["/df"]

print (readDf - df)
h5pyFile.close()
pdStore.close()

我仍然不知道同时处理h5py和pandas文件是否有任何问题。

这篇关于如何将属性添加到作为组存储在HDF5文件中的 pandas 数据帧？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持编程学习网！