pyxet module#
Login#
Open#
XetFS#
- class pyxet.XetFS(*args, **kwargs)[source]#
Bases:
AbstractFileSystem
Inherited Members
- cat(path, recursive=False, on_error='raise', **kwargs)#
Fetch (potentially multiple) paths’ contents
- Parameters:
recursive (bool) – If True, assume the path(s) are directories, and get all the contained files
on_error ("raise", "omit", "return") – If raise, an underlying exception will be raised (converted to KeyError if the type is in self.missing_exceptions); if omit, keys with exception will simply not be included in the output; if “return”, all keys are included in the output, but the value will be bytes or an exception instance.
kwargs (passed to cat_file)
- Returns:
dict of {path (contents} if there are multiple paths)
or the path has been otherwise expanded
- cat_file(path, start=None, end=None, **kwargs)#
Get the content of a file
- Parameters:
path (URL of file on this filesystems)
start (int) – Bytes limits of the read. If negative, backwards from end, like usual python slices. Either can be None for start or end of file, respectively
end (int) – Bytes limits of the read. If negative, backwards from end, like usual python slices. Either can be None for start or end of file, respectively
kwargs (passed to
open()
.)
- checksum(path)#
Unique value for current version of file
If the checksum is the same from one moment to another, the contents are guaranteed to be the same. If the checksum changes, the contents might have changed.
This should normally be overridden; default will probably capture creation/modification timestamp (which would be good) or maybe access timestamp (which would be bad)
- copy(path1, path2, recursive=False, maxdepth=None, on_error=None, **kwargs)#
Copy within two locations in the filesystem
- on_error“raise”, “ignore”
If raise, any not-found exceptions will be raised; if ignore any not-found exceptions will cause the path to be skipped; defaults to raise unless recursive is true, where the default is ignore
- delete(path, recursive=False, maxdepth=None)#
Alias of AbstractFileSystem.rm.
- download(rpath, lpath, recursive=False, **kwargs)#
Alias of AbstractFileSystem.get.
- get(rpath, lpath, recursive=False, callback=<fsspec.callbacks.NoOpCallback object>, maxdepth=None, **kwargs)#
Copy file(s) to local.
Copies a specific file or tree of files (if recursive=True). If lpath ends with a “/”, it will be assumed to be a directory, and target files will go within. Can submit a list of paths, which may be glob-patterns and will be expanded.
Calls get_file for each source.
- get_file(rpath, lpath, callback=<fsspec.callbacks.NoOpCallback object>, outfile=None, **kwargs)#
Copy single remote file to local
- glob(path, maxdepth=None, **kwargs)#
Find files by glob-matching.
If the path ends with ‘/’, only folders are returned.
We support
"**"
,"?"
and"[..]"
. We do not support ^ for pattern negation.The maxdepth option is applied on the first ** found in the path.
kwargs are passed to
ls
.
- head(path, size=1024)#
Get the first
size
bytes from file
- isfile(path)#
Is this entry file-like?
- put(lpath, rpath, recursive=False, callback=<fsspec.callbacks.NoOpCallback object>, maxdepth=None, **kwargs)#
Copy file(s) from local.
Copies a specific file or tree of files (if recursive=True). If rpath ends with a “/”, it will be assumed to be a directory, and target files will go within.
Calls put_file for each source.
- __init__(endpoint=None, **storage_options)[source]#
Opens the repository at repo_url as an fsspec file system handle, providing read-only operations such as ls, glob, and open.
User and token are needed for private repositories and they can be set with pyxet.login.
Examples:
import pyxet fs = pyxet.XetFS('xethub.com') # List files. fs.ls('XetHub/Flickr30k/main') # Read the first 5 lines of a file b = fs.open('XetHub/Flickr30k/main/results.csv').read()
the Xet repository endpoint can be set with the ‘endpoint’ argument or the XET_ENDPOINT environment variable. The default endpoint is xethub.com if unspecified
- add_deduplication_hints(path_urls)[source]#
Fetches and downloads all of the metadata needed for binary deduplication against all the paths given by paths. Once fetched, new data will be deduplicated against any binary content given by paths.
- branch_info(url)[source]#
Returns information about a branch user/repo/branch or xet://[endpoint:]<user>/<repo>/<branch>
- cp_file(path1, path2, *args, **kwargs)[source]#
Copies a file or directory from a xet path to another xet path.
Copies must be performed within the context of a transaction and are allowed to span branches
- end_transaction()[source]#
Finish write transaction, non-context version. See
start_transaction()
- info(url)[source]#
Returns information about a path user/repo/branch/[path] or xet://[endpoint:]<user>/<repo>/<branch>/[path]
- list_branches(path, raw=False, **kwargs)[source]#
Lists the branches for a path of the form user/repo or xet://[endpoint:]<user>/<repo>
- list_repos(url, raw=False, **kwargs)[source]#
Lists the repos available for a path of the form user or xet://[endpoint:]<user>
- ls(path: str, detail=True, **kwargs)[source]#
List objects at path. This should include subdirectories and files at that location. The difference between a file and a directory must be clear when details are requested. The specific keys, or perhaps a FileInfo class, or similar, is TBD, but must be consistent across implementations. Must include:
full path to the entry (without protocol)
size of the entry, in bytes. If the value cannot be determined, will be
None
.type of entry, “file”, “directory” or other
Additional information may be present, appropriate to the file-system, e.g., generation, checksum, etc. May use refresh=True|False to allow use of self._ls_from_cache to common where listing may be expensive.
- Parameters:
path – str
detail – bool if True, gives a list of dictionaries, where each is the same as the result of
info(path)
. If False, gives a list of paths (str).kwargs – may have additional backend-specific options, such as version information
- Returns:
List of strings if detail is False, or list of directory information dicts if detail is True. These dicts would have: name (full path in the FS), size (in bytes), type (file, directory, or something else) and other FS-specific keys.
- mv(path1, path2, *args, **kwargs)[source]#
Moves a file or directory from a xet path to another xet path.
Moves must be performed within the context of a transaction and must be within the same branch
- rm(path, *args, **kwargs)[source]#
Delete a file.
Deletions must be performed within the context of a transaction which must be scoped to within a single repository branch.
- start_transaction(commit_message=None)[source]#
Begin a write transaction for a repository and branch. The entire transaction is committed atomically at the end of the transaction. All writes must be performed into this branch
repo_and_branch is of the form <user>/<repo>/<branch> or xet://[endpoint:]<user>/<repo>/<branch>:
fs.start_transaction('my commit message') file = fs.open('user/repo/main/hello.txt','w') file.write('hello world') file.close() fs.end_transaction()
The transaction object is an instance of the
MultiCommitTransaction
- property transaction#
Begin a transaction context for a given repository and branch. The entire transaction is committed atomically at the end of the transaction. All writes must be performed into this branch:
with fs.transaction as tr: tr.set_commit_message("message") file = fs.open('<user>/<repo>/main/hello.txt','w') file.write('hello world') file.close()
The transaction object is an instance of the
MultiCommitTransaction
MultiCommitTransaction#
- class pyxet.MultiCommitTransaction(fs, commit_message=None)[source]#
Handles a commit using the transaction interface. This transaction handler supports transactions across multiple branches by tracking them separately. Simultaneous changes across branches will require multiple actual transactions to complete.
- complete(commit=True)[source]#
Finalizes and commits or cancels this transaction. The transaction can be restarted with start()
- copy(src_repo_info, dest_repo_info)[source]#
Copies a file from src to dest. src_repo_info and dest_repo_info are the returned values from pyxet.parse_url(url)
- mv(src_repo_info, dest_repo_info)[source]#
Moves a file from src to dest. src_repo_info and dest_repo_info are the returned values from pyxet.parse_url(url)