File system

Pyxet implements a simple and intuitive API based on the fsspec library. Use the same API to access local files, remote files, and files in XetHub. All operations are currently read-only; write functionality is in development.

Using URLs

Xet URLs should be of the form xet://<repo_user>/<repo_name>/<branch>/<path-to-file>, where <path-to-file> is optional if the URL refers to a repository. The xet:// prefix is not neccery when using pyxet.XetFS.

pyxet.XetFS

To work with a XetHub repository as a file system, you can use the pyxet.XetFS class. This class provides a file system handle for a XetHub repository, allowing you to perform read-only operations like ls, glob, and open. The initialization of this class requires a repository URL and optional arguments for branch, user, and token.

Example usage of pyxet.XetFS:

  import pyxet

  # Create a file system handle for a public repository.
  fs = pyxet.XetFS()

  # List files in the repository.
  files = fs.ls('xet://XetHub/Flickr30k/main')

  # Open a file from the repository.
  f = fs.open('xet://XetHub/Flickr30k/main/results.csv')

  # Read the contents of the file.
  contents = f.read()

Other common utils

import pyxet

fs = pyxet.XetFS()  # fsspec filesystem

# Reads
fs.info(
    "xdssio/titanic/main/titanic.csv")  # {'name': 'https://xethub.com/main/titanic.csv', 'size': 61194, 'type': 'file'}
fs.open("xdssio/titanic/main/titanic.csv", 'r').read(11)  # 'PassengerId'
fs.get("xdssio/titanic/main/data/*parquet", "data", recursive=True)  # Download file/directories recursively
fs.cp("xdssio/titanic/main/titanic.csv", "titanic.csv")  # fsspec cp
fs.ls("xdssio/titanic/main/data/", detail=False)  # ['data/titanic_0.parquet', 'data/titanic_1.parquet']

# Writes - You need to have write permissions to that repo
with fs.transaction("xdssio/titanic/main"):
    fs.open("xdssio/titanic/main/text.txt", 'w').write("Hello World")
with fs.transaction("xdssio/titanic/main"):
    fs.cp("xdssio/titanic/main/titanic.csv", "xdssio/titanic/main/titanic2.csv")
fs.info(
    "xdssio/titanic/main/titanic2.csv")  # {'name': 'https://xethub.com/main/titanic2.csv', 'size': 61194, 'type': 'file'}
with fs.transaction("xdssio/titanic/main"):
    fs.rm("xdssio/titanic/main/titanic2.csv")
fs.info("xdssio/titanic/main/titanic2.csv")  # FileNotFoundError: xdssio / titanic / main / titanic2.csv

fsspec

Many packages such as pandas and pyarrow support the fsspec protocol. xet:// URLs must be used as file paths in these packages. For example, to read a csv from pandas, use:

  import pyxet # make xet protocol available to fsspec
  import pandas as pd

  df = pd.read_csv('xet://XetHub/Flickr30k/main/results.csv')

All fsspec read-only functionality is supported; write operations such as flush() and write() are in development.