File system#

Pyxet implements a simple API based on the fsspec library. Use it to access local files, remote files, and files in XetHub.

Using URLs#

Xet URLs are in the form:

xet://<repo_owner>/<repo_name>/<branch>/<path_to_file>

The <path_to_file> argument is optional if the URL refers to a repository and the xet:// prefix is optional when using pyxet.XetFS.

Accessing private repositories#

To create your own repositories or access private repositories, first create a XetHub account and set your personal access token.

pyxet.XetFS#

To work with a XetHub repository as a file system, you can use the pyxet.XetFS class. This class provides a file system handle for a XetHub repository, allowing you to perform opens, reads, and writes. The initialization of this class requires a repository URL and optional arguments for branch, user, and token. All write operations will automatically commit the change back to XetHub; the optional commit message will be applied when available.

Example usage of pyxet.XetFS:

  import pyxet

  # Create a file system handle for a repository
  fs = pyxet.XetFS()

  # List files in the repository.
  files = fs.ls('xet://XetHub/Flickr30k/main')

  # Open a file from the repository.
  f = fs.open('xet://XetHub/Flickr30k/main/results.csv')

  # Read the contents of the file.
  contents = f.read()

  # Write to a repository with an optional commit message
with fs.transaction as tr:
    tr.set_commit_message("Writing things")
    fs.open("<user_name>/<repo_name>/main/foo", 'w').write("Hello world!")

Other common utilities#

  import pyxet

  fs = pyxet.XetFS()  # fsspec filesystem

  # Read functions
  fs.info("xethub/titanic/main/titanic.csv")
  # returns repo level info: {'name': 'https://xethub.com/XetHub/titanic/titanic.csv', 'size': 61194, 'type': 'file'}

  fs.open("XetHub/titanic/main/titanic.csv", 'r').read(20)
  # returns first 20 characters: 'PassengerId,Survived'

  fs.get("XetHub/titanic/main/data/", "data", recursive=True)
  # download remote directory recursively into a local data folder

  fs.ls("XetHub/titanic/main/data/", detail=False)
  # returns ['data/titanic_0.parquet', 'data/titanic_1.parquet']

  # Write functions, with optional commit message
  with fs.transaction as tr:
    tr.set_commit_message("Write hi")
    fs.open("<user_name>/<repo_name>/main/text.txt", 'w').write("Hello world!")
  # writes "Hello World" to text.txt, Git commits the change with comment "Write hi" in the main branch of the repository

  with fs.transaction as tr:
    tr.set_commit_message("Copy file")
    fs.cp("<user_name>/<repo_name>/main/text.txt", "<user_name>/<repo_name>/main/text2.txt")
  # copies text.txt into text2.txt in the main branch of the repository, commits the change with "Copy file"

  with fs.transaction as tr:
    tr.set_commit_message("Remove file")
    fs.rm("<user_name>/<repo_name>/main/titanic2.csv")
  fs.info("XetHub/titanic/main/titanic2.csv") 
   # removes a file from the main branch of the repository with comment "Remove file"

fsspec#

Many packages such as pandas and pyarrow support the fsspec protocol. xet:// URLs must be used as file paths when interacting with these packages. For example, to read a CSV from pandas, use:

  import pyxet   # make xet protocol available to fsspec
  import pandas as pd

  df = pd.read_csv('xet://XetHub/Flickr30k/main/results.csv')