4. API Reference¶
Most of the work in this package was derived from the specification for OLE Compound Document files published by OpenOffice, and the specification for the Advanced Authoring Format (AAF) published by Microsoft.
4.1. CompoundFileReader¶
-
class
compoundfiles.
CompoundFileReader
(filename_or_obj)[source]¶ Provides an interface for reading OLE Compound Document files.
The
CompoundFileReader
class provides a relatively simple interface for interpreting the content of Microsoft’s OLE Compound Document files. These files can be thought of as a file-system in a file (or a loop-mounted FAT file-system for Unix folk).The class can be constructed with a filename or a file-like object. In the latter case, the object must support the
read
,seek
, andtell
methods. For optimal usage, it should also provide a valid file descriptor in response to a call tofileno
, but this is not mandatory.The
root
attribute represents the root storage entity in the compound document. Anopen()
method is provided which (given aCompoundFileEntity
instance representing a stream), returns a file-like object representing the content of the stream.Finally, the context manager protocol is also supported, permitting usage of the class like so:
with CompoundFileReader('foo.doc') as doc: # Iterate over items in the root directory of the compound document for entry in doc.root: # If any entry is a file, attempt to read the data from it if entry.isfile: with doc.open(entry) as f: f.read()
-
root
¶ The root attribute represents the root storage entity in the compound document. As a
CompoundFileEntity
instance, it (and child storages) can be enumerated, accessed by index, or by name (like a dict) to obtainCompoundFileEntity
instances representing the content of the compound document.Both
CompoundFileReader
andCompoundFileEntity
support human-readable representations making it relatively simple to browse and extract information from compound documents simply by using the interactive Python command line.
-
open
(filename_or_entity)[source]¶ Return a file-like object with the content of the specified entity.
Given a
CompoundFileEntity
instance which represents a stream, or a string representing the path to one (using/
separators), this method returns an instance ofCompoundFileStream
which can be used to read the content of the stream.
-
4.2. CompoundFileStream¶
-
class
compoundfiles.
CompoundFileStream
[source]¶ Abstract base class for streams within an OLE Compound Document.
Instances of
CompoundFileStream
are not constructed directly, but are returned by theCompoundFileReader.open()
method. They support all common methods associated with read-only streams (read()
,seek()
,tell()
, and so forth).-
read
(n=-1)[source]¶ Read up to n bytes from the stream and return them. As a convenience, if n is unspecified or -1,
readall()
is called. Fewer than n bytes may be returned if there are fewer than n bytes from the current stream position to the end of the stream.If 0 bytes are returned, and n was not 0, this indicates end of the stream.
-
read1
(n=-1)[source]¶ Read up to n bytes from the stream using only a single call to the underlying object.
In the case of
CompoundFileStream
this roughly corresponds to returning the content from the current position up to the end of the current sector.
-
seek
(offset, whence=0)[source]¶ Change the stream position to the given byte offset. offset is interpreted relative to the position indicated by whence. Values for whence are:
SEEK_SET
or0
- start of the stream (the default); offset should be zero or positiveSEEK_CUR
or1
- current stream position; offset may be negativeSEEK_END
or2
- end of the stream; offset is usually negative
Return the new absolute position.
-
4.3. CompoundFileEntity¶
-
class
compoundfiles.
CompoundFileEntity
(parent, stream, index)[source]¶ Represents an entity in an OLE Compound Document.
An entity in an OLE Compound Document can be a “stream” (analogous to a file in a file-system) which has a
size
and can be opened by a call to the parent object’sopen()
method. Alternatively, it can be a “storage” (analogous to a directory in a file-system), which has no size but hascreated
andmodified
time-stamps, and can contain other streams and storages.If the entity is a storage, it will act as an iterable read-only sequence, indexable by ordinal or by name, and compatible with the
in
operator and built-inlen()
function.-
created
¶ For storage entities (where
isdir
isTrue
), this returns the creation date of the storage. ReturnsNone
for stream entities.
-
isdir
¶ Returns True if this is a storage entity which can contain other entities.
-
isfile
¶ Returns True if this is a stream entity which can be opened.
-
modified
¶ For storage entities (where
isdir
is True), this returns the last modification date of the storage. ReturnsNone
for stream entities.
-
name
¶ Returns the name of entity. This can be up to 31 characters long and may contain any character representable in UTF-16 except the NULL character. Names are considered case-insensitive for comparison purposes.
-