pyxet module#

Login#

pyxet.login(token, email=None, host=None)[source]#

Sets the active login credentials used to authenticate against Xethub.

Open#

pyxet.open(mode='rb', **kwargs)[source]#

Open the file at the specific Xet file URL of the form xet://<repo_user>/<repo_name>/<branch>/<path-to-file>:

f = pyxet.open('xet://XetHub/Flickr30k/main/results.csv')

XetFS#

class pyxet.XetFS(*args, **kwargs)[source]#

Bases: AbstractFileSystem

Inherited Members

cat(path, recursive=False, on_error='raise', **kwargs)#

Fetch (potentially multiple) paths’ contents

Parameters:
  • recursive (bool) – If True, assume the path(s) are directories, and get all the contained files

  • on_error ("raise", "omit", "return") – If raise, an underlying exception will be raised (converted to KeyError if the type is in self.missing_exceptions); if omit, keys with exception will simply not be included in the output; if “return”, all keys are included in the output, but the value will be bytes or an exception instance.

  • kwargs (passed to cat_file)

Returns:

  • dict of {path (contents} if there are multiple paths)

  • or the path has been otherwise expanded

cat_file(path, start=None, end=None, **kwargs)#

Get the content of a file

Parameters:
  • path (URL of file on this filesystems)

  • start (int) – Bytes limits of the read. If negative, backwards from end, like usual python slices. Either can be None for start or end of file, respectively

  • end (int) – Bytes limits of the read. If negative, backwards from end, like usual python slices. Either can be None for start or end of file, respectively

  • kwargs (passed to open().)

checksum(path)#

Unique value for current version of file

If the checksum is the same from one moment to another, the contents are guaranteed to be the same. If the checksum changes, the contents might have changed.

This should normally be overridden; default will probably capture creation/modification timestamp (which would be good) or maybe access timestamp (which would be bad)

copy(path1, path2, recursive=False, maxdepth=None, on_error=None, **kwargs)#

Copy within two locations in the filesystem

on_error“raise”, “ignore”

If raise, any not-found exceptions will be raised; if ignore any not-found exceptions will cause the path to be skipped; defaults to raise unless recursive is true, where the default is ignore

delete(path, recursive=False, maxdepth=None)#

Alias of AbstractFileSystem.rm.

download(rpath, lpath, recursive=False, **kwargs)#

Alias of AbstractFileSystem.get.

get(rpath, lpath, recursive=False, callback=<fsspec.callbacks.NoOpCallback object>, maxdepth=None, **kwargs)#

Copy file(s) to local.

Copies a specific file or tree of files (if recursive=True). If lpath ends with a “/”, it will be assumed to be a directory, and target files will go within. Can submit a list of paths, which may be glob-patterns and will be expanded.

Calls get_file for each source.

get_file(rpath, lpath, callback=<fsspec.callbacks.NoOpCallback object>, outfile=None, **kwargs)#

Copy single remote file to local

glob(path, maxdepth=None, **kwargs)#

Find files by glob-matching.

If the path ends with ‘/’, only folders are returned.

We support "**", "?" and "[..]". We do not support ^ for pattern negation.

The maxdepth option is applied on the first ** found in the path.

kwargs are passed to ls.

head(path, size=1024)#

Get the first size bytes from file

isfile(path)#

Is this entry file-like?

put(lpath, rpath, recursive=False, callback=<fsspec.callbacks.NoOpCallback object>, maxdepth=None, **kwargs)#

Copy file(s) from local.

Copies a specific file or tree of files (if recursive=True). If rpath ends with a “/”, it will be assumed to be a directory, and target files will go within.

Calls put_file for each source.

__init__(domain=None, **storage_options)[source]#

Opens the repository at repo_url as an fsspec file system handle, providing read-only operations such as ls, glob, and open.

User and token are needed for private repositories and they can be set with pyxet.login.

Examples:

import pyxet
fs = pyxet.XetFS()

# List files.
fs.ls('XetHub/Flickr30k/main')

# Read the first 5 lines of a file
b = fs.open('XetHub/Flickr30k/main/results.csv').read()

the Xet repository endpoint can be set with the ‘domain’ argument or the XET_ENDPOINT environment variable. The default domain is xethub.com if unspecified

add_deduplication_hints(path_urls)[source]#

Fetches and downloads all of the metadata needed for binary deduplication against all the paths given by paths. Once fetched, new data will be deduplicated against any binary content given by paths.

branch_info(url)[source]#

Returns information about a branch user/repo/branch or xet://user/repo/branch

cancel_transaction()[source]#

Cancels any active transactions. non-context version

cp_file(path1, path2, *args, **kwargs)[source]#

Copies a file or directory from a xet path to another xet path.

Copies must be performed within the context of a transaction and are allowed to span branches

delete_branch(repo, branch_name)[source]#

deletes a branch in a repo

end_transaction()[source]#

Finish write transaction, non-context version. See start_transaction()

get_username()[source]#

Returns the inferred username for the domain

info(url)[source]#

Returns information about a path user/repo/branch/[path] or xet://user/repo/branch/[path]

is_repo(path)[source]#

Returns true if the path is a repo

isdir(path)[source]#

Is this entry directory-like?

isdir_or_branch(path)[source]#

Is this entry directory-like?

list_branches(path, raw=False, **kwargs)[source]#

Lists the branches for a path of the form user/repo or xet://user/repo

list_repos(raw=False, **kwargs)[source]#

Lists the repos available for a path of the form user or xet://user

ls(path, detail=True, **kwargs)[source]#

List objects at path. This should include subdirectories and files at that location. The difference between a file and a directory must be clear when details are requested. The specific keys, or perhaps a FileInfo class, or similar, is TBD, but must be consistent across implementations. Must include:

  • full path to the entry (without protocol)

  • size of the entry, in bytes. If the value cannot be determined, will be None.

  • type of entry, “file”, “directory” or other

Additional information may be present, appropriate to the file-system, e.g., generation, checksum, etc. May use refresh=True|False to allow use of self._ls_from_cache to common where listing may be expensive.

Parameters:
  • path – str

  • detail – bool if True, gives a list of dictionaries, where each is the same as the result of info(path). If False, gives a list of paths (str).

  • kwargs – may have additional backend-specific options, such as version information

Returns:

List of strings if detail is False, or list of directory information dicts if detail is True. These dicts would have: name (full path in the FS), size (in bytes), type (file, directory, or something else) and other FS-specific keys.

make_branch(repo, src_branch_name, target_branch_name)[source]#

Creates a branch in a repo

makedir(*args, **kwargs)[source]#

Noop. Empty directories cannot be created

makedirs(*args, **kwargs)[source]#

Noop. Empty directories cannot be created

mkdir(*args, **kwargs)[source]#

Noop. Empty directories cannot be created

mkdirs(*args, **kwargs)[source]#

Noop. Empty directories cannot be created

move(path1, path2, *args, **kwargs)[source]#

Alias of AbstractFileSystem.mv.

mv(path1, path2, *args, **kwargs)[source]#

Moves a file or directory from a xet path to another xet path.

Moves must be performed within the context of a transaction and must be within the same branch

rm(path, *args, **kwargs)[source]#

Delete a file.

Deletions must be performed within the context of a transaction which must be scoped to within a single repository branch.

set_commit_message(message)[source]#

Sets the commit message on the active transaction

start_transaction(commit_message=None)[source]#

Begin a write transaction for a repository and branch. The entire transaction is committed atomically at the end of the transaction. All writes must be performed into this branch

repo_and_branch is of the form user/repo/branch or xet://user/repo/branch:

fs.start_transaction('my commit message')
file = fs.open('user/repo/main/hello.txt','w')
file.write('hello world')
file.close()
fs.end_transaction()

The transaction object is an instance of the MultiCommitTransaction

property transaction#

Begin a transaction context for a given repository and branch. The entire transaction is committed atomically at the end of the transaction. All writes must be performed into this branch:

with fs.transaction as tr:
    tr.set_commit_message("message")
    file = fs.open('user/repo/main/hello.txt','w')
    file.write('hello world')
    file.close()

The transaction object is an instance of the MultiCommitTransaction

update_size(path, bucket_size)[source]#

Calls Xetea to update the size of a synchronized S3 bucket for the repo.

MultiCommitTransaction#

class pyxet.MultiCommitTransaction(fs, commit_message=None)[source]#

Handles a commit using the transaction interface. This transaction handler supports transactions across multiple branches by tracking them separately. Simultaneous changes across branches will require multiple actual transactions to complete.

complete(commit=True)[source]#

Finalizes and commits or cancels this transaction. The transaction can be restarted with start()

copy(src_repo_info, dest_repo_info)[source]#

Copies a file from src to dest. src_repo_info and dest_repo_info are the returned values from pyxet.parse_url(url)

mv(src_repo_info, dest_repo_info)[source]#

Moves a file from src to dest. src_repo_info and dest_repo_info are the returned values from pyxet.parse_url(url)

open_for_write(repo_info)[source]#

Opens a file for writing. repo_info is the result of pyxet.parse_url(url)

rm(repo_info)[source]#

Removes a file. repo_info is the return value of pyxet.parse_url(url)

set_commit_message(commit_message)[source]#

Sets the commit message to be used. This applies to every current uncommitted transaction and future transactions. If commit_message is None, a default message “Commit [current datetime]” is used.