perrygeo/pyimpute

语言: Python

git: https://github.com/perrygeo/pyimpute

使用Scikit-learn和Rasterio进行空间分类和回归
Spatial classification and regression using Scikit-learn and Rasterio
README.md (中文)

travis

使用scikit-learn和rasterio进行地理空间预测的Python模块

pyimpute提供高级python功能,用于弥合空间数据格式和机器学习软件之间的差距,以便于对地理空间数据进行监督分类和回归。这允许您基于稀疏观察创建景观尺度预测。

这些观察结果称为训练数据,包括:

  • 响应变量:我们试图预测的内容
  • 解释变量:解释响应空间模式的变量

目标数据由栅格数据集表示的解释变量组成。目标数据没有可用的响应变量;目标是预测响应的栅格表面。响应可以是离散的(分类)或连续的(回归)。

example

Pyimpute函数

  • load_training_vector:加载训练数据,其中响应是矢量数据(解释变量总是光栅)
  • load_training_raster:加载响应为栅格数据的训练数据
  • stratified_sample_raster:基于离散类的栅格单元的随机采样
  • evaluate_clf:执行交叉验证并打印指标,以帮助调整scikit-learn分类器。
  • load_targets:将目标栅格数据加载到scikit-learn所需的数据结构中
  • impute:获取目标数据和scikit-learn分类器并进行预测,输出GeoTiff

这些功能并没有真正提供任何突破性的新功能,它们只是节省了大量繁琐的数据争吵,否则会导致您的分析陷入低级细节。换句话说,pyimpute为空间预测提供了一个高级python工作流程,使其更容易:

  • 更容易探索新变量
  • 经常使用新信息更新预测(例如新的Landsat图像,因为它可用)
  • 将该技术引入其他学科和地理区域

基本的例子

这是pyimpute工作流程的样子。在这个例子中,我们有两个解释变量,如栅格(温度和降水)和一个geojson,用于观察植物物种的栖息地适宜性。我们的目标是仅根据解释变量预测整个地区的栖息地适宜性。

from pyimpute import load_training_vector, load_targets, impute, evaluate_clf
from sklearn.ensemble import RandomForestClassifier

加载一些训练数据

explanatory_rasters = ['temperature.tif', 'precipitation.tif']
response_data = 'point_observations.geojson'

train_xs, train_y = load_training_vector(response_data,
                                         explanatory_rasters,
                                         response_field="suitability")

训练一个scikit-learn分类器

clf = RandomForestClassifier(n_estimators=10, n_jobs=1)
clf.fit(train_xs, train_y)

使用多个验证指标评估分类器,手动检查输出

evaluate_clf(clf, train_xs, train_y)

加载目标栅格数据

target_xs, raster_info = load_targets(explanatory_rasters)

做出预测,输出地理信息

impute(target_xs, clf, raster_info, outdir='/tmp',
        linechunk=400, class_prob=True, certainty=True)

assert os.path.exists("/tmp/responses.tif")
assert os.path.exists("/tmp/certainty.tif")
assert os.path.exists("/tmp/probability_0.tif")
assert os.path.exists("/tmp/probability_1.tif")

安装

假设您已安装libgdal和scipy系统依赖项,则可以使用pip进行安装

pip install pyimpute

或者,从源代码安装

git clone https://github.com/perrygeo/pyimpute.git
cd pyimpute
pip install -e .

有关Ubuntu系统上的工作示例,请参阅.travis.yml文件。

其他资源

如需概述,请观看我在FOSS4G 2014上的演讲:使用pyimpute,scikit-learn和GDAL对气候变化影响进行时空预测 - Matthew Perry

另外,查看示例和wiki

本文使用googletrans自动翻译,仅供参考, 原文来自github.com

en_README.md

travis

Python module for geospatial prediction using scikit-learn and rasterio

pyimpute provides high-level python functions for bridging the gap between spatial data formats and machine learning software to facilitate supervised classification and regression on geospatial data. This allows you to create landscape-scale predictions based on sparse observations.

The observations, known as the training data, consists of:

  • response variables: what we are trying to predict
  • explanatory variables: variables which explain the spatial patterns of responses

The target data consists of explanatory variables represented by raster datasets. There are no response variables available for the target data; the goal is to predict a raster surface of responses. The responses can either be discrete (classification) or continuous (regression).

example

Pyimpute Functions

  • load_training_vector: Load training data where responses are vector data (explanatory variables are always raster)
  • load_training_raster: Load training data where responses are raster data
  • stratified_sample_raster: Random sampling of raster cells based on discrete classes
  • evaluate_clf: Performs cross-validation and prints metrics to help tune your scikit-learn classifiers.
  • load_targets: Loads target raster data into data structures required by scikit-learn
  • impute: takes target data and your scikit-learn classifier and makes predictions, outputing GeoTiffs

These functions don't really provide any ground-breaking new functionality, they merely saves lots of tedious data wrangling that would otherwise bog your analysis down in low-level details. In other words, pyimpute provides a high-level python workflow for spatial prediction, making it easier to:

  • explore new variables more easily
  • frequently update predictions with new information (e.g. new Landsat imagery as it becomes available)
  • bring the technique to other disciplines and geographies

Basic example

Here's what a pyimpute workflow might look like. In this example, we have two explanatory variables as rasters (temperature and precipitation) and a geojson with point observations of habitat suitability for a plant species. Our goal is to predict habitat suitability across the entire region based only on the explanatory variables.

from pyimpute import load_training_vector, load_targets, impute, evaluate_clf
from sklearn.ensemble import RandomForestClassifier

Load some training data

explanatory_rasters = ['temperature.tif', 'precipitation.tif']
response_data = 'point_observations.geojson'

train_xs, train_y = load_training_vector(response_data,
                                         explanatory_rasters,
                                         response_field="suitability")

Train a scikit-learn classifier

clf = RandomForestClassifier(n_estimators=10, n_jobs=1)
clf.fit(train_xs, train_y)

Evalute the classifier using several validation metrics, manually inspecting the output

evaluate_clf(clf, train_xs, train_y)

Load target raster data

target_xs, raster_info = load_targets(explanatory_rasters)

Make predictions, outputing geotiffs

impute(target_xs, clf, raster_info, outdir='/tmp',
        linechunk=400, class_prob=True, certainty=True)

assert os.path.exists("/tmp/responses.tif")
assert os.path.exists("/tmp/certainty.tif")
assert os.path.exists("/tmp/probability_0.tif")
assert os.path.exists("/tmp/probability_1.tif")

Installation

Assuming you have libgdal and the scipy system dependencies installed, you can install with pip

pip install pyimpute

Alternatively, install from the source code

git clone https://github.com/perrygeo/pyimpute.git
cd pyimpute
pip install -e .

See the .travis.yml file for a working example on Ubuntu systems.

Other resources

For an overview, watch my presentation at FOSS4G 2014: Spatial-Temporal Prediction of Climate Change Impacts using pyimpute, scikit-learn and GDAL — Matthew Perry

Also, check out the examples and the wiki