Databricks Runtime 13.1 per Machine Learning (EoS)

Nota

Il supporto per questa versione di Databricks Runtime è terminato. Per la data di fine del supporto, vedere Cronologia di fine del supporto. Per tutte le versioni supportate di Databricks Runtime, vedere Versioni e compatibilità delle note sulla versione di Databricks Runtime.

Databricks Runtime 13.1 per Machine Learning è un ambiente pronto all’uso ottimizzato per l'esecuzione di processi di apprendimento automatico e data science basato su Databricks Runtime 13.1 (EoS). Databricks Runtime ML contiene molte di queste popolari librerie per l’apprendimento automatico, tra cui TensorFlow, PyTorch e XGBoost. Databricks Runtime ML include AutoML, uno strumento per eseguire automaticamente il training delle pipeline di Machine Learning. Databricks Runtime ML supporta inoltre il training di Deep Learning distribuito tramite Horovod.

Per altre informazioni, incluse le istruzioni per la creazione di un cluster di Databricks Runtime ML, vedere IA e Machine Learning in Databricks.

Miglioramenti e nuove funzionalità

Databricks Runtime 13.1 ML è basato su Databricks Runtime 13.1. Per informazioni sulle novità di Databricks Runtime 13.1, tra cui Apache Spark MLlib e SparkR, vedere le note sulla versione di Databricks Runtime 13.1 (EoS).

Modifiche all'archivio funzionalità di Databricks

In Databricks Runtime 13.1 ML e versioni successive, negli archivi MySQL, publish_table usa il tipo LONGTEXT per i dati stringa nelle tabelle delle funzionalità. Se si pubblica una tabella usando Databricks Runtime 13.1 ML e quindi è necessario scriverla usando Databricks Runtime 13.0 o versione successiva, bisognerà usare publish_table in modalità di sovrascrittura oppure eliminare e pubblicare nuovamente la tabella online.

Ambiente di sistema

L'ambiente di sistema in Databricks Runtime 13.1 ML differisce da Databricks Runtime 13.1 come indicato di seguito:

Databricks Runtime 13.1 ML include XGBoost 1.7.5, che non supporta cluster GPU con capacità di calcolo 5.2 e versioni precedenti.

Librerie

Le sezioni seguenti elencano le librerie incluse in Databricks Runtime 13.1 ML che differiscono da quelle incluse in Databricks Runtime 13.1.

Contenuto della sezione:

Librerie di livello superiore

Databricks Runtime 13.1 ML include le librerie di livello superiore seguenti:

Librerie Python

Databricks Runtime 13.1 ML usa Virtualenv per la gestione dei pacchetti Python e include molti dei pacchetti ML più diffusi.

Sono state introdotte le librerie Python seguenti con Databricks Runtime 13.1 ML:

  • langchain
  • librosa
  • pytesseract
  • sentencepiece
  • sentence-transformers
  • soundfile
  • tiktoken

Oltre ai pacchetti specificati nelle sezioni seguenti, Databricks Runtime 13.1 ML include anche i pacchetti seguenti:

  • hyperopt 0.2.7+db3
  • sparkdl 3.0.0_db1
  • automl 1.18.0

Per riprodurre l'ambiente Python di Databricks Runtime ML nell'ambiente virtuale Python locale, scaricare il file requirements-13.1.txt ed eseguire pip install -r requirements-13.1.txt. Questo comando installa tutte le librerie open source usate da Databricks Runtime ML, ma non installa librerie sviluppate da Databricks, ad esempio databricks-automl, databricks-feature-store o la creazione di copia tramite fork di Databricks di hyperopt.

Librerie Python nei cluster CPU

Libreria Versione Libreria Versione Libreria Versione
absl-py 1.0.0 accelerate 0.18.0 aiohttp 3.8.4
aiosignal 1.3.1 appdirs 1.4.4 argon2-cffi 21.3.0
argon2-cffi-bindings 21.2.0 astor 0.8.1 asttoken 2.2.1
astunparse 1.6.3 async-timeout 4.0.2 attrs 21.4.0
audioread 3.0.0 azure-core 1.26.4 azure-cosmos 4.3.1b1
azure-storage-blob 12.16.0 azure-storage-file-datalake 12.11.0 backcall 0.2.0
bcrypt 3.2.0 beautifulsoup4 4.11.1 black 22.6.0
bleach 4.1.0 blinker 1.4 blis 0.7.9
boto3 1.24.28 botocore 1.27.28 cachetools 4.2.4
catalogue 2.0.8 category-encoders 2.6.0 certifi 2022.9.14
cffi 1.15.1 chardet 4.0.0 charset-normalizer 2.0.4
Clic 8.0.4 cloudpickle 2.0.0 cmdstanpy 1.1.0
confection 0.0.4 configparser 5.2.0 convertdate 2.4.0
Crittografia 37.0.1 cycler 0.11.0 cymem 2.0.7
Cython 0.29.32 databricks-automl-runtime 0.2.16 databricks-cli 0.17.6
databricks-feature-store 0.12.0 dataclasses-json 0.5.7 datasets 2.12.0
dbl-tempo 0.1.23 dbus-python 1.2.18 debugpy 1.5.1
decorator 5.1.1 defusedxml 0.7.1 dill 0.3.4
diskcache 5.6.1 distlib 0.3.6 docstring-to-markdown 0.12
entrypoints 0.4 ephem 4.1.4 evaluate 0.4.0
executing 1.2.0 facet-overview 1.0.3 fastjsonschema 2.16.3
fasttext 0.9.2 filelock 3.6.0 Flask 1.1.2
flatbuffers 23.3.3 fonttools 4.25.0 frozenlist 1.3.3
fsspec 2022.7.1 future 0.18.2 gast 0.4.0
gitdb 4.0.10 GitPython 3.1.27 google-api-core 2.8.2
google-auth 1.33.0 google-auth-oauthlib 0.4.6 google-cloud-core 2.3.2
google-cloud-storage 2.8.0 google-crc32c 1.5.0 google-pasta 0.2.0
google-resumable-media 2.5.0 googleapis-common-protos 1.56.4 greenlet 1.1.1
grpcio 1.48.1 grpcio-status 1.48.1 gunicorn 20.1.0
gviz-api 1.10.0 h5py 3.7.0 hijri-converter 2.3.1
festività 0.22 horovod 0.27.0 htmlmin 0.1.12
httplib2 0.20.2 huggingface-hub 0.14.1 idna 3.3
ImageHash 4.3.1 imbalanced-learn 0.8.1 importlib-metadata 4.11.3
ipykernel 6.17.1 ipython 8.10.0 ipython-genutils 0.2.0
ipywidgets 7.7.2 isodate 0.6.1 itsdangerous 2.0.1
jedi 0.18.1 jeepney 0.7.1 Jinja2 2.11.3
jmespath 0.10.0 joblib 1.2.0 joblibspark 0.5.1
jsonschema 4.16.0 jupyter-client 7.3.4 jupyter_core 4.11.2
jupyterlab-pygments 0.1.2 jupyterlab-widgets 1.0.0 keras 2.11.0
keyring 23.5.0 kiwisolver 1.4.2 korean-lunar-calendar 0.3.1
langchain 0.0.152 langcodes 3.3.0 launchpadlib 1.10.16
lazr.restfulclient 0.14.4 lazr.uri 1.0.6 lazy_loaderopenai 0.2
libclang 15.0.6.1 librosa 0.10.0 lightgbm 3.3.5
llvmlite 0.38.0 LunarCalendar 0.0.9 Mako 1.2.0
Markdown 3.3.4 MarkupSafe 2.0.1 marshmallow 3.19.0
marshmallow-enum 1.5.1 matplotlib 3.5.2 matplotlib-inline 0.1.6
mccabe 0.7.0 mistune 0.8.4 mleap 0.20.0
mlflow-skinny 2.3.1 more-itertools 8.10.0 msgpack 1.0.5
multidict 6.0.4 multimethod 1.9.1 multiprocess 0.70.12.2
mormurhash 1.0.9 mypy-extensions 0.4.3 nbclient 0.5.13
nbconvert 6.4.4 nbformat 5.5.0 nest-asyncio 1.5.5
networkx 2.8.4 nltk 3.7 nodeenv 1.7.0
notebook 6.4.12 numba 0.55.1 numexpr 2.8.4
numpy 1.21.5 oauthlib 3.2.0 openai 0.27.4
openapi-schema-pydantic 1.2.4 opt-einsum 3.3.0 creazione del pacchetto 21.3
pandas 1.4.4 pandocfilters 1.5.0 paramiko 2.9.2
parso 0.8.3 pathspec 0.9.0 patia 0.10.1
patsy 0.5.2 petastorm 0.12.1 pexpect 4.8.0
phik 0.12.3 pickleshare 0.7.5 Pillow 9.2.0
pip 22.2.2 platformdirs 2.5.2 plotly 5.9.0
pluggy 1.0.0 pmdarima 2.0.3 pooch 1.7.0
preshed 3.0.8 prometheus-client 0.14.1 prompt-toolkit 3.0.36
prophet 1.1.2 protobuf 3.19.4 psutil 5.9.0
psycopg2 2.9.3 ptyprocess 0.7.0 pure-eval 0.2.2
pyarrow 8.0.0 pyasn1 0.4.8 pyasn1-modules 0.2.8
pybind11 2.10.4 pycparser 2.21 pydantic 1.10.6
pyflakes 3.0.1 Pygments 2.11.2 PyGObjec 3.42.1
PyJWT 2.3.0 PyMeeus 0.5.12 PyNaCl 1.5.0
pyodbc 4.0.32 pyparsing 3.0.9 pyright 1.1.294
pyrsistent 0.18.0 pytesseract 0.3.10 python-dateutil 2.8.2
python-editor 1.0.4 python-lsp-jsonrpc 1.0.0 python-lsp-server 1.7.1
pytoolconfig 1.2.2 pytz 2022.1 PyWavelets 1.3.0
PyYAML 6.0 pyzmq 23.2.0 regex 2022.7.9
requests 2.28.1 requests-oauthlib 1.3.1 responses 0.18.0
rope 1.7.0 rsa 4.9 s3transfer 0.6.0
scikit-learn 1.1.1 scipy 1.9.1 seaborn 0.11.2
SecretStorage 3.3.1 Send2Trash 1.8.0 sentence-transformers 2.2.2
sentencepiece 0.1.97 setuptools 63.4.1 shap 0.41.0
simplejson 3.17.6 sei 1.16.0 filtro dei dati 0.0.7
smart-open 5.2.1 smmap 5.0.0 soundfile 0.12.1
soupsieve 2.3.1 soxr 0.3.5 spacy 3.5.1
spacy-legacy 3.0.12 spacy-logger 1.0.4 spark-tensorflow-distributor 1.0.0
SQLAlchemy 1.4.39 sqlparse 0.4.2 srsly 2.4.6
ssh-import-id 5.11 stack-data 0.6.2 statsmodels 0.13.2
tabulate 0.8.10 tangled-up-in-unicode 0.2.0 tenacity 8.1.0
tensorboard 2.11.0 tensorboard-data-server 0.6.1 tensorboard-plugin-profile 2.11.2
tensorboard-plugin-wit 1.8.1 tensorflow-cpu 2.11.0 tensorflow-estimator 2.11.0
tensorflow-io-gcs-filesystem 0.32.0 termcolor 2.3.0 terminado 0.13.1
testpath 0.6.0 thinc 8.1.9 threadpoolctl 2.2.0
tiktoken 0.3.3 tokenize-rt 4.2.1 tokenizers 0.13.3
tomli 2.0.1 torch 1.13.1+cpu torchvision 0.14.1+cpu
tornado 6.1 tqdm 4.64.1 traitlets 5.1.1
transformers 4.28.1 typeguard 2.13.3 typer 0.7.0
typing-inspect 0.8.0 typing_extensions 4.3.0 ujson 5.4.0
unattended-upgrades 0.1 urllib3 1.26.11 virtualenv 20.16.3
visions 0.7.5 wadllib 1.3.6 wasabi 1.1.1
wcwidth 0.2.5 webencodings 0.5.1 websocket-client 0.58.0
Werkzeug 2.0.3 whatthepatch 1.0.2 wheel 0.37.1
widgetsnbextension 3.6.1 wrapt 1.14.1 xgboost 1.7.5
xxhash 3.2.0 yapf 0.31.0 yarl 1.9.2
ydata-profiling 4.1.2 zipp 3.8.0

Librerie Python nei cluster GPU

Libreria Versione Libreria Versione Libreria Versione
absl-py 1.0.0 accelerate 0.18.0 aiohttp 3.8.4
aiosignal 1.3.1 appdirs 1.4.4 argon2-cffi 21.3.0
argon2-cffi-bindings 21.2.0 astor 0.8.1 asttoken 2.2.1
astunparse 1.6.3 async-timeout 4.0.2 attrs 21.4.0
audioread 3.0.0 azure-core 1.26.4 azure-cosmos 4.3.1b1
azure-storage-blob 12.16.0 azure-storage-file-datalake 12.11.0 backcall 0.2.0
bcrypt 3.2.0 beautifulsoup4 4.11.1 black 22.6.0
bleach 4.1.0 blinker 1.4 blis 0.7.9
boto3 1.24.28 botocore 1.27.28 cachetools 4.2.4
catalogue 2.0.8 category-encoders 2.6.0 certifi 2022.9.14
cffi 1.15.1 chardet 4.0.0 charset-normalizer 2.0.4
Clic 8.0.4 cloudpickle 2.0.0 cmdstanpy 1.1.0
confection 0.0.4 configparser 5.2.0 convertdate 2.4.0
Crittografia 37.0.1 cycler 0.11.0 cymem 2.0.7
Cython 0.29.32 databricks-automl-runtime 0.2.16 databricks-cli 0.17.6
databricks-feature-store 0.12.0 dataclasses-json 0.5.7 datasets 2.12.0
dbl-tempo 0.1.23 dbus-python 1.2.18 debugpy 1.5.1
decorator 5.1.1 defusedxml 0.7.1 dill 0.3.4
diskcache 5.6.1 distlib 0.3.6 docstring-to-markdown 0.12
entrypoints 0.4 ephem 4.1.4 evaluate 0.4.0
executing 1.2.0 facet-overview 1.0.3 fastjsonschema 2.16.3
fasttext 0.9.2 filelock 3.6.0 Flask 1.1.2
flatbuffers 23.3.3 fonttools 4.25.0 frozenlist 1.3.3
fsspec 2022.7.1 future 0.18.2 gast 0.4.0
gitdb 4.0.10 GitPython 3.1.27 google-api-core 2.8.2
google-auth 1.33.0 google-auth-oauthlib 0.4.6 google-cloud-core 2.3.2
google-cloud-storage 2.8.0 google-crc32c 1.5.0 google-pasta 0.2.0
google-resumable-media 2.5.0 googleapis-common-protos 1.56.4 greenlet 1.1.1
grpcio 1.48.1 grpcio-status 1.48.1 gunicorn 20.1.0
gviz-api 1.10.0 h5py 3.7.0 hijri-converter 2.3.1
festività 0.22 horovod 0.27.0 htmlmin 0.1.12
httplib2 0.20.2 huggingface-hub 0.14.1 idna 3.3
ImageHash 4.3.1 imbalanced-learn 0.8.1 importlib-metadata 4.11.3
ipykernel 6.17.1 ipython 8.10.0 ipython-genutils 0.2.0
ipywidgets 7.7.2 isodate 0.6.1 itsdangerous 2.0.1
jedi 0.18.1 jeepney 0.7.1 Jinja2 2.11.3
jmespath 0.10.0 joblib 1.2.0 joblibspark 0.5.1
jsonschema 4.16.0 jupyter-client 7.3.4 jupyter_core 4.11.2
jupyterlab-pygments 0.1.2 jupyterlab-widgets 1.0.0 keras 2.11.0
keyring 23.5.0 kiwisolver 1.4.2 korean-lunar-calendar 0.3.1
langchain 0.0.152 langcodes 3.3.0 launchpadlib 1.10.16
lazr.restfulclient 0.14.4 lazr.uri 1.0.6 lazy_loaderopenai 0.2
libclang 15.0.6.1 librosa 0.10.0 lightgbm 3.3.5
llvmlite 0.38.0 LunarCalendar 0.0.9 Mako 1.2.0
Markdown 3.3.4 MarkupSafe 2.0.1 marshmallow 3.19.0
marshmallow-enum 1.5.1 matplotlib 3.5.2 matplotlib-inline 0.1.6
mccabe 0.7.0 mistune 0.8.4 mleap 0.20.0
mlflow-skinny 2.3.1 more-itertools 8.10.0 msgpack 1.0.5
multidict 6.0.4 multimethod 1.9.1 multiprocess 0.70.12.2
mormurhash 1.0.9 mypy-extensions 0.4.3 nbclient 0.5.13
nbconvert 6.4.4 nbformat 5.5.0 nest-asyncio 1.5.5
networkx 2.8.4 nltk 3.7 nodeenv 1.7.0
notebook 6.4.12 numba 0.55.1 numexpr 2.8.4
numpy 1.21.5 oauthlib 3.2.0 openai 0.27.4
openapi-schema-pydantic 1.2.4 opt-einsum 3.3.0 creazione del pacchetto 21.3
pandas 1.4.4 pandocfilters 1.5.0 paramiko 2.9.2
parso 0.8.3 pathspec 0.9.0 patia 0.10.1
patsy 0.5.2 petastorm 0.12.1 pexpect 4.8.0
phik 0.12.3 pickleshare 0.7.5 Pillow 9.2.0
pip 22.2.2 platformdirs 2.5.2 plotly 5.9.0
pluggy 1.0.0 pmdarima 2.0.3 pooch 1.7.0
preshed 3.0.8 prompt-toolkit 3.0.36 prophet 1.1.2
protobuf 3.19.4 psutil 5.9.0 psycopg2 2.9.3
ptyprocess 0.7.0 pure-eval 0.2.2 pyarrow 8.0.0
pyasn1 0.4.8 pyasn1-modules 0.2.8 pybind11 2.10.4
pycparser 2.21 pydantic 1.10.6 pyflakes 3.0.1
Pygments 2.11.2 PyGObjec 3.42.1 PyJWT 2.3.0
PyMeeus 0.5.12 PyNaCl 1.5.0 pyodbc 4.0.32
pyparsing 3.0.9 pyright 1.1.294 pyrsistent 0.18.0
pytesseract 0.3.10 python-dateutil 2.8.2 python-editor 1.0.4
python-lsp-jsonrpc 1.0.0 python-lsp-server 1.7.1 pytoolconfig 1.2.2
pytz 2022.1 PyWavelets 1.3.0 PyYAML 6.0
pyzmq 23.2.0 regex 2022.7.9 requests 2.28.1
requests-oauthlib 1.3.1 responses 0.18.0 rope 1.7.0
rsa 4.9 s3transfer 0.6.0 scikit-learn 1.1.1
scipy 1.9.1 seaborn 0.11.2 SecretStorage 3.3.1
Send2Trash 1.8.0 sentence-transformers 2.2.2 sentencepiece 0.1.97
setuptools 63.4.1 shap 0.41.0 simplejson 3.17.6
sei 1.16.0 filtro dei dati 0.0.7 smart-open 5.2.1
smmap 5.0.0 soundfile 0.12.1 soupsieve 2.3.1
soxr 0.3.5 spacy 3.5.1 spacy-legacy 3.0.12
spacy-logger 1.0.4 spark-tensorflow-distributor 1.0.0 sqlparse 0.4.2
srsly 2.4.6 ssh-import-id 5.11 stack-data 0.6.2
statsmodels 0.13.2 tabulate 0.8.10 tangled-up-in-unicode 0.2.0
tenacity 8.1.0 tensorboard 2.11.0 tensorboard-data-server 0.6.1
tensorboard-plugin-profile 2.11.2 tensorboard-plugin-wit 1.8.1 tensorflow 2.11.0
tensorflow-estimator 2.11.0 tensorflow-io-gcs-filesystem 0.32.0 termcolor 2.3.0
terminado 0.13.1 testpath 0.6.0 thinc 8.1.9
threadpoolctl 2.2.0 tiktoken 0.3.3 tokenize-rt 4.2.1
tokenizers 0.13.3 tomli 2.0.1 torch 1.13.1+cu117
torchvision 0.14.1+cu117 tornado 6.1 tqdm 4.64.1
traitlets 5.1.1 transformers 4.28.1 typeguard 2.13.3
typer 0.7.0 typing-inspect 0.8.0 typing_extensions 4.3.0
ujson 5.4.0 unattended-upgrades 0.1 urllib3 1.26.11
virtualenv 20.16.3 visions 0.7.5 wadllib 1.3.6
wasabi 1.1.1 wcwidth 0.2.5 webencodings 0.5.1
websocket-client 0.58.0 Werkzeug 2.0.3 whatthepatch 1.0.2
wheel 0.37.1 widgetsnbextension 3.6.1 wrapt 1.14.1
xgboost 1.7.5 xxhash 3.2.0 yapf 0.31.0
yarl 1.9.2 ydata-profiling 4.1.2 zipp 3.8.0

Librerie R

Le librerie R sono identiche alle librerie R in Databricks Runtime 13.1.

Librerie Java e Scala (cluster Scala 2.12)

Oltre alle librerie Java e Scala in Databricks Runtime 13.1, Databricks Runtime 13.1 ML contiene i file JAR seguenti:

Cluster CPU

ID gruppo ID artefatto Versione
com.typesafe.akka akka-actor_2.12 2.5.23
ml.dmlc xgboost4j-spark_2.12 1.7.3
ml.dmlc xgboost4j_2.12 1.7.3
org.graphframes graphframes_2.12 0.8.2-db2-spark3.4
org.mlflow mlflow-client 2.3.1
org.scala-lang.modules scala-java8-compat_2.12 0.8.0
org.tensorflow spark-tensorflow-connector_2.12 1.15.0

Cluster GPU

ID gruppo ID artefatto Versione
com.typesafe.akka akka-actor_2.12 2.5.23
ml.dmlc xgboost4j-gpu_2.12 1.7.3
ml.dmlc xgboost4j-spark-gpu_2.12 1.7.3
org.graphframes graphframes_2.12 0.8.2-db2-spark3.4
org.mlflow mlflow-client 2.3.1
org.scala-lang.modules scala-java8-compat_2.12 0.8.0
org.tensorflow spark-tensorflow-connector_2.12 1.15.0