주석
이 강의는 Linfiniti와 S. Motala(남아프리카공화국 케이프 페닌슐라 기술대학교)가 작성했습니다.
공간 통계를 이용하면 주어진 벡터 데이터셋이 어떤 의미인지 분석하고 이해할 수 있습니다. QGIS는 이런 목적에 대해 유용하다고 알려진 몇몇 표준 통계 분석 도구를 포함하고 있습니다.
The goal for this lesson: To know how to use QGIS’ spatial statistics tools.
강의에서 사용할 포인트 데이터셋을 얻기 위해, 랜덤한 포인트들을 생성해보겠습니다.
이 때 포인트를 생성하려는 구역의 범위를 정의하는 폴리곤 데이터셋이 필요합니다.
거리들이 차지한 구역을 사용하겠습니다.
주석
You might find that your SRTM DEM layer has a different CRS to that of the roads layer. If so, you can reproject either the roads or DEM layer using techniques learnt earlier in this module.
Now you can check the sampled data from the raster file in the attributes table of the random_samples layer, they will be in a column named srtm_41_19.tif.
다음과 비슷한 샘플 레이어가 보일 것입니다.
The sample points are classified by their value such that darker points are at a lower altitude.
나머지 통계 실습 동안 이 샘플 레이어를 사용할 것입니다.
이제 이 레이어에 대한 기본적인 통계를 내보겠습니다.
주석
You can copy and paste the results into a spreadsheet. The data uses a (colon :) separator.
To understand the statistics above, refer to this definition list:
중간(평균)값은 값을 모두 더한 것을 값의 개수로 나눈 값입니다.
표준편차입니다. 값들이 얼마나 중간값에 가까이 모여 있는지를 나타냅니다. 표준편차가 작을수록 값들이 중간값에 더 가까이 모이는 경향이 있습니다.
모든 값들을 더한 값입니다.
최소값입니다.
최대값입니다.
샘플/값의 개수입니다.
최소/최대값의 차이입니다.
모든 값을 최소에서 최대로 배열할 경우, 그 중앙에 있는 (또는 N이 짝수라면 두 중앙값의 평균) 값을 중앙값이라 합니다.
To generate a distance matrix using these points:
다음과 같이 설정하십시오.
To do a nearest neighbor analysis:
주석
You can copy and paste the results into a spreadsheet. The data uses a (colon :) separator.
데이터의 평균 좌표를 얻으려면,
이 레이어를 랜덤 샘플을 생성하는 데 쓰인 폴리곤의 중앙 좌표와 비교해봅시다.
As you can see from the example below, the mean coordinates and the center of the study area (in orange) don’t necessarily coincide:
The histogram of a dataset shows the distribution of its values. The simplest way to demonstrate this in QGIS is via the image histogram, available in the Layer Properties dialog of any image layer.
Properties 를 선택합니다.
Histogram 탭을 선택하십시오. 그래픽을 생성하려면 Compute Histogram 버튼을 클릭해야 할 수도 있습니다. 이미지 안의 값들의 빈도를 나타내는 그래프를 볼 수 있을 것입니다.
다음과 같이 그래프를 이미지로 내보낼 수 있습니다.
The mean value is 332.8, and the maximum value is 1699! But those values don’t show up on the histogram. Why not? It’s because there are so few of them, compared to the abundance of pixels with values below the mean. That’s also why the histogram extends so far to the right, even though there is no visible red line marking the frequency of values higher than about 250.
따라서 히스토그램은 값들의 분포를 보여줄 뿐, 그래프 상에 모든 값을 보여주지 않을 수도 있다는 점을 기억해야 합니다.
Let’s say you have a collection of sample points from which you would like to extrapolate data. For example, you might have access to the random_samples dataset we created earlier, and would like to have some idea of what the terrain looks like.
To start, launch the Grid (Interpolation) tool by clicking on the Raster ‣ Analysis ‣ Grid (Interpolation) menu item.
다음은 원래 데이터셋(왼쪽)과 샘플 포인트로부터 구축한 데이터셋(오른쪽)을 비교한 그림입니다. 사용자가 구축한 데이터셋은 샘플 포인트들의 위치의 랜덤성에 따라 달라 보일 수도 있습니다.
As you can see, 100 sample points aren’t really enough to get a detailed impression of the terrain. It gives a very general idea, but it can be misleading as well. For example, in the image above, it is not clear that there is a high, unbroken mountain running from east to west; rather, the image seems to show a valley, with high peaks to the west. Just using visual inspection, we can see that the sample dataset is not representative of the terrain.
이 포인트들을 이용해서 원 DEM을 샘플링하십시오.
결과물은 (여러분의 랜덤 포인트 위치에 따라) 다음과 비슷하게 보일 것입니다.
The border shows the roads_hull layer (which represents the boundary of the random sample points) to explain the sudden lack of detail beyond its edges. This is a much better representation of the terrain, due to the much greater density of sample points.
Here is an example of what it looks like with 10 000 sample points:
주석
It’s not recommended that you try doing this with 10 000 sample points if you are not working on a fast computer, as the size of the sample dataset requires a lot of processing time.
Originally a separate project and then accessible as a plugin, the SEXTANTE software has been added to QGIS as a core function from version 2.0. You can find it as a new QGIS menu with its new name Processing from where you can access a rich toolbox of spatial analysis tools allows you to access various plugin tools from within a single interface.
You will probably see it docked in QGIS to the right of the map. Note that the tools listed here are links to the actual tools. Some of them are SEXTANTE’s own algorithms and others are links to tools that are accessed from external applications such as GRASS, SAGA or the Orfeo Toolbox. This external applications are installed with QGIS so you are already able to make use of them. In case you need to change the configuration of the Processing tools or, for example, you need to update to a new version of one of the external applications, you can access its setting from Processing ‣ Options and configurations.
For a simple indication of the spatial distribution of points in the random_samples dataset, we can make use of SAGA’s Spatial Point Pattern Analysis tool via the Processing Toolbox you just opened.
주석
If SAGA is not installed on your system, the plugin’s dialog will inform you that the dependency is missing. If this is not the case, you can skip these steps.
Included in your course materials you will find the SAGA installer for Windows.
Once you have installed SAGA, you’ll need to configure SEXTANTE to find the path it was installed under.
Homebrew users can install SAGA with this command:
If you do not use Homebrew, please follow the instructions here:
http://sourceforge.net/apps/trac/saga-gis/wiki/Compiling%20SAGA%20on%20Mac%20OS%20X
Now that you have installed and configured SAGA, its functions will become accessible to you.
The output will look like this (the symbology was changed for this example):
The red dot is the mean center; the large circle is the standard distance, which gives an indication of how closely the points are distributed around the mean center; and the rectangle is the bounding box, describing the smallest possible rectangle which will still enclose all the points.
Often, the output of an algorithm will not be a shapefile, but rather a table summarizing the statistical properties of a dataset. One of these is the Minimum Distance Analysis tool.
It does not require any other input besides specifying the vector point dataset to be analyzed.
QGIS를 사용하면 데이터셋의 속성에 대해 다양한 공간 통계 분석을 할 수 있습니다.
이제 벡터 분석에 대한 내용을 마쳤으니, 래스터에 대해 알아보는 것은 어떨까요? 이것이 다음 모듈의 주제입니다!