ex-Hull-SHP-from-HDBSCAN-clustering-probabilities

所属分类:GIS/地图编程
开发工具:Python
文件大小:0KB
下载次数:0
上传日期:2021-08-17 19:56:34
上 传 者sh-1993
说明:  在给定点图层形状文件中定义围绕簇中心的边界。,
(Defines a boundary around cluster centers in a given point-layer shapefile.,)

文件列表:
LICENSE (1070, 2021-08-17)
Picture/ (0, 2021-08-17)
Picture/boston_example.jpg (7149064, 2021-08-17)
Picture/example - cluster probability values.png (110717, 2021-08-17)
Picture/example - smooth density of cluster probabilities.png (69495, 2021-08-17)
Picture/steps.png (303584, 2021-08-17)
convex_cluster.py (3831, 2021-08-17)
implementation.py (534, 2021-08-17)
layers/ (0, 2021-08-17)
layers/example_boston.cpg (5, 2021-08-17)
layers/example_boston.dbf (101759, 2021-08-17)
layers/example_boston.prj (266, 2021-08-17)
layers/example_boston.sbn (49660, 2021-08-17)
layers/example_boston.sbx (2988, 2021-08-17)
layers/example_boston.shp (135648, 2021-08-17)
layers/example_boston.shp.xml (559, 2021-08-17)
layers/example_boston.shx (38828, 2021-08-17)
layers/example_dc.cpg (5, 2021-08-17)
layers/example_dc.dbf (174461, 2021-08-17)
layers/example_dc.prj (266, 2021-08-17)
layers/example_dc.sbn (87428, 2021-08-17)
layers/example_dc.sbx (5244, 2021-08-17)
layers/example_dc.shp (232584, 2021-08-17)
layers/example_dc.shp.xml (559, 2021-08-17)
layers/example_dc.shx (66524, 2021-08-17)
layers/example_nyc.cpg (5, 2021-08-17)
layers/example_nyc.dbf (67969, 2021-08-17)
layers/example_nyc.prj (266, 2021-08-17)
layers/example_nyc.sbn (60468, 2021-08-17)
layers/example_nyc.sbx (2916, 2021-08-17)
layers/example_nyc.shp (172944, 2021-08-17)
layers/example_nyc.shp.xml (557, 2021-08-17)
layers/example_nyc.shx (49484, 2021-08-17)
output/ (0, 2021-08-17)
output/output_boston.dbf (1348, 2021-08-17)
output/output_boston.prj (266, 2021-08-17)
output/output_boston.shp (1756, 2021-08-17)
output/output_boston.shx (140, 2021-08-17)
output/output_dc.dbf (1810, 2021-08-17)
... ...

# Generate-Convex-Hull-SHP-from-HDBSCAN-clustering-probabilities Defines a boundary around cluster centers in a given point-layer shapefile. ## Overview When we want to make a division into clusters of geographical coordinates, we often get inaccurate results. This is because the clustering algorithm often assigns points to a some cluster that do not necessarily have to be associated with it. This outcome creates a situation in which it is more difficult to define the boundaries of the cluster. In order to create a solution to this gap, the code ["**convex_cluster.py**"](https://github.com/EtzionData/generate-Convex-Hull-SHP-from-HDBSCAN-clustering-probabilities/blob/master/convex_cluster.py) was developed. The code receives in the first step a esri shapefile (**SHP**), and extracts the X and Y coordinates from it. Another information he produces from the SHP is the **Geographic Coordinate System** of the layer, which will be used to save the code output as a new SHP. Based on the given coordinates, the code performs clustering using the **HDBSCAN** algorithm which returns two attributes to each point in space: **label** and **probability**. The label determines which cluster each point is associated with, and the probability defines each point belonging to its cluster. This data can be plot as a heat-map, such as in this example: ![probability](https://github.com/EtzionData/generate-Convex-Hull-SHP-from-HDBSCAN-clustering-probabilities/blob/master/Picture/example%20-%20cluster%20probability%20values.png) As can be seen in the example, the closer a point is to the cluster center, the higher its probability, and closer to the value 1. In contrast, points farther from the center approach to 0. Every cluster can have a different density, based on its distribution characteristics. As you can see, different clusters can have a completely different density (the code used to generate this plot with full documentation available here: [**multi-smooth-density-plot**](https://github.com/EtzionData/create-multi-smooth-density-plot)): ![density](https://github.com/EtzionData/generate-Convex-Hull-SHP-from-HDBSCAN-clustering-probabilities/blob/master/Picture/example%20-%20smooth%20density%20of%20cluster%20probabilities.png) Based on these data, we can choose a threshold condition (**"prob"**) from which only we selected to analyze points from the cluster. The code create a boundary around the choosen point using the **Convex Hull** algorithm. Now we get a polygon that defines the cluster boundaries, based on the threshold conditions we have defined. This ploygon focus the cluster boundary to the core so that they include only the point fit the defined threshold. The all process can describe through using the following steps (for **prob=0.5**): ![steps](https://github.com/EtzionData/generate-Convex-Hull-SHP-from-HDBSCAN-clustering-probabilities/blob/master/Picture/steps.png) As you can see, the cluster density take major rule in the definition of the boundary. It can be seen that **cluster 2** is so dense that its boundary is small and very focused. In contrast, **cluster 1** has a lower density, and therefore, its boundary is much larger and wider. The boundary generated will be saved as a new SHP file. When we save the data, we will use the **Geographic Coordinate System** that we imported from the SHP file at the beginning of the process. In addition to the geographical boundary, we will add the following data to each row in the new layer: - The number of points belonging to the cluster according to the threshold conditions we set (**"count"**) - Coordinates of the cluster center (**"center_x"** and **"center_y"**) - Name and number of the cluster (**"id"** and **"name"**) An example of one of the SHP file created using the code can be seen in the **Boston** area: ![Boston](https://github.com/EtzionData/generate-Convex-Hull-SHP-from-HDBSCAN-clustering-probabilities/blob/master/Picture/boston_example.jpg) The code will returned the data about each of the original points as a new pandas dataframe, along with their probability and label data. In addition, the code will also return the records that composed the SHP that created, also as dataframe. All the layers that used as examples in this repository, are also accessible: [**original layers**](https://github.com/EtzionData/generate-Convex-Hull-SHP-from-HDBSCAN-clustering-probabilities/tree/master/layers) [**output layers**](https://github.com/EtzionData/generate-Convex-Hull-SHP-from-HDBSCAN-clustering-probabilities/tree/master/output) ## Libraries The code uses the following libraries in Python: **shapefile** **hdbscan** **pandas** **scipy** **numpy** ## Application An application of the code is attached to this page under the name: [**"implementation.py"** ](https://github.com/EtzionData/generate-Convex-Hull-SHP-from-HDBSCAN-clustering-probabilities/blob/master/implementation.py) the examples outputs are also attached here. ## Example for using the code To use this code, you just need to import it as follows: ``` sh # import from convex_cluster import convex_cluster # define variables filename = (r'path\file.shp') output = 'outputname' size = 63 prob = 0.7 # application convex_cluster(filename, output, size, prob) ``` When the variables displayed are: **filename:** the given shapefile path **output:** name for the output shapefile **size:** min_cluster_size value for HDBSCAN clustering **prob:** threshold condition for cluster probability value (default: 0, must be between 0 to 1) ## License MIT [Etzion Harari](https://github.com/EtzionData)

近期下载者

相关文件


收藏者