There is a newer version of the record available.

Published February 2, 2021 | Version 0.14.0
Software Open

datalad/datalad: ## 0.14.0 (February 02, 2021)

  • 1. Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Centre Jülich, Jülich, Germany and Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
  • 2. Dartmouth College, Hanover, NH, United States
  • 3. Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Centre Jülich, Jülich, Germany
  • 4. University of Texas at Austin
  • 5. UC Berkeley - UCSF Graduate Program in Bioengineering
  • 6. UC Berkeley
  • 7. Stanford University, Stanford, CA, United States
  • 8. Psychoinformatics Lab, INM-7, Research Centre Juelich
  • 9. Maze Therapeutics, South San Francisco, CA, United States

Description

Major refactoring and deprecations

  • Git versions below v2.19.1 are no longer supported. #4650

  • The minimum git-annex version is still 7.20190503, but, if you're on Windows (or use adjusted branches in general), please upgrade to at least 8.20200330 but ideally 8.20210127 to get subdataset-related fixes. #4292 #5290

  • The minimum supported version of Python is now 3.6. #4879

  • publish is now deprecated in favor of push. It will be removed in the 0.15.0 release at the earliest.

  • A new command runner was added in v0.13. Functionality related to the old runner has now been removed: Runner, GitRunner, and run_gitcommand_on_file_list_chunks from the datalad.cmd module along with the datalad.tests.protocolremote, datalad.cmd.protocol, and datalad.cmd.protocol.prefix configuration options. #5229

  • The --no-storage-sibling switch of create-sibling-ria is deprecated in favor of --storage-sibling=off and will be removed in a later release. #5090

  • The get_git_dir static method of GitRepo is deprecated and will be removed in a later release. Use the dot_git attribute of an instance instead. #4597

  • The ProcessAnnexProgressIndicators helper from datalad.support.annexrepo has been removed. #5259

  • The save argument of install, a noop since v0.6.0, has been dropped. #5278

  • The get_URLS method of AnnexCustomRemote is deprecated and will be removed in a later release. #4955

  • ConfigManager.get now returns a single value rather than a tuple when there are multiple values for the same key, as very few callers correctly accounted for the possibility of a tuple return value. Callers can restore the old behavior by passing get_all=True. #4924

  • In 0.12.0, all of the assure_* functions in datalad.utils were renamed as ensure_*, keeping the old names around as compatibility aliases. The assure_* variants are now marked as deprecated and will be removed in a later release. #4908

  • The datalad.inteface.run module, which was deprecated in 0.12.0 and kept as a compatibility shim for datalad.core.local.run, has been removed. #4583

  • The saver argument of datalad.core.local.run.run_command, marked as obsolete in 0.12.0, has been removed. #4583

  • The dataset_only argument of the ConfigManager class was deprecated in 0.12 and has now been removed. #4828

  • The linux_distribution_name, linux_distribution_release, and on_debian_wheezy attributes in datalad.utils are no longer set at import time and will be removed in a later release. Use datalad.utils.get_linux_distribution instead. #4696

  • datalad.distribution.clone, which was marked as obsolete in v0.12 in favor of datalad.core.distributed.clone, has been removed. #4904

  • datalad.support.annexrepo.N_AUTO_JOBS, announced as deprecated in v0.12.6, has been removed. #4904

  • The compat parameter of GitRepo.get_submodules, added in v0.12 as a temporary compatibility layer, has been removed. #4904

  • The long-deprecated (and non-functional) url parameter of GitRepo.__init__ has been removed. #5342

Fixes
  • Cloning onto a system that enters adjusted branches by default (as Windows does) did not properly record the clone URL. #5128

  • The RIA-specific handling after calling clone was correctly triggered by ria+http URLs but not ria+https URLs. #4977

  • If the registered commit wasn't found when cloning a subdataset, the failed attempt was left around. #5391

  • The remote calls to cp and chmod in create-sibling were not portable and failed on macOS. #5108

  • A more reliable check is now done to decide if configuration files need to be reloaded. #5276

  • The internal command runner's handling of the event loop has been improved to play nicer with outside applications and scripts that use asyncio. #5350 #5367

Enhancements and new features
  • The subdataset handling for adjusted branches, which is particularly important on Windows where git-annex enters an adjusted branch by default, has been improved. A core piece of the new approach is registering the commit of the primary branch, not its checked out adjusted branch, in the superdataset. Note: This means that git status will always consider a subdataset on an adjusted branch as dirty while datalad status will look more closely and see if the tip of the primary branch matches the registered commit. #5241

  • The performance of the subdatasets command has been improved, with substantial speedups for recursive processing of many subdatasets. #4868 #5076

  • Adding new subdatasets via save has been sped up. #4793

  • get, save, and addurls gained support for parallel operations that can be enabled via the --jobs command-line option or the new datalad.runtime.max-jobs configuration option. #5022

  • addurls

    • learned how to read data from standard input. #4669
    • now supports tab-separated input. #4845
    • now lets Python callers pass in a list of records rather than a file name. #5285
    • gained a --drop-after switch that signals to drop a file's content after downloading and adding it to the annex. #5081
    • is now able to construct a tree of files from known checksums without downloading content via its new --key option. #5184
    • records the URL file in the commit message as provided by the caller rather than using the resolved absolute path. #5091
    • is now speedier. #4867 #5022
  • create-sibling-github learned how to create private repositories (thanks to Nolan Nichols). #4769

  • create-sibling-ria gained a --storage-sibling option. When --storage-sibling=only is specified, the storage sibling is created without an accompanying Git sibling. This enables using hosts without Git installed for storage. #5090

  • The download machinery (and thus the datalad special remote) gained support for a new scheme, shub://, which follows the same format used by singularity run and friends. In contrast to the short-lived URLs obtained by querying Singularity Hub directly, shub:// URLs are suitable for registering with git-annex. #4816

  • A provider is now included for https://registry-1.docker.io URLs. This is useful for storing an image's blobs in a dataset and registering the URLs with git-annex. #5129

  • The add-readme command now links to the DataLad handbook rather than http://docs.datalad.org. #4991

  • New option datalad.locations.extra-procedures specifies an additional location that should be searched for procedures. #5156

  • The class for handling configuration values, ConfigManager, now takes a lock before writes to allow for multiple processes to modify the configuration of a dataset. #4829

  • clone now records the original, unresolved URL for a subdataset under submodule.<name>.datalad-url in the parent's .gitmodules, enabling later get calls to use the original URL. This is particularly useful for ria+ URLs. #5346

  • Installing a subdataset now uses custom handling rather than calling git submodule update --init. This avoids some locking issues when running get in parallel and enables more accurate source URLs to be recorded. #4853

  • GitRepo.get_content_info, a helper that gets triggered by many commands, got faster by tweaking its git ls-files call. #5067

  • wtf now includes credentials-related information (e.g. active backends) in the its output. #4982

  • The call_git* methods of GitRepo now have a read_only parameter. Callers can set this to True to promise that the provided command does not write to the repository, bypassing the cost of some checks and locking. #5070

  • New call_annex* methods in the AnnexRepo class provide an interface for running git-annex commands similar to that of the GitRepo.call_git* methods. #5163

  • It's now possible to register a custom metadata indexer that is discovered by search and used to generate an index. #4963

  • The ConfigManager methods get, getbool, getfloat, and getint now return a single value (with same precedence as git config --get) when there are multiple values for the same key (in the non-committed git configuration, if the key is present there, or in the dataset configuration). For get, the old behavior can be restored by specifying get_all=True. #4924

  • Command-line scripts are now defined via the entry_points argument of setuptools.setup instead of the scripts argument. #4695

  • Interactive use of --help on the command-line now invokes a pager on more systems and installation setups. #5344

  • The datalad special remote now tries to eliminate some unnecessary interactions with git-annex by being smarter about how it queries for URLs associated with a key. #4955

  • The GitRepo class now does a better job of handling bare repositories, a step towards bare repositories support in DataLad. #4911

  • More internal work to move the code base over to the new command runner. #4699 #4855 #4900 #4996 #5002 #5141 #5142 #5229

Files

datalad/datalad-0.14.0.zip

Files (1.9 MB)

Name Size Download all
md5:ffa4c9d9a3e6d03396b228790aba93b8
1.9 MB Preview Download

Additional details

Related works

Funding

CRCNS US-German Data Sharing: DataGit - converging catalogues, warehouses, and deployment logistics into a federated 'data distribution' 1429999
National Science Foundation