Proceedings of the Third World Fisheries Congress: Feeding the World with Fish in the Next Millenium—The Balance between Production and Environment

Extracting Spatio-Temporal Patterns from Ocean Fishery Data Sets in the East China Sea Using Spatial Cluster Analysis

Yunyan Du, Chenghu Zhou, Quanqin Shao, Fenzhen Su, Sheng Wang

doi: https://doi.org/10.47886/9781888569551.ch51

Since the mid-1980s, the concern of geographic information systems (GISs) has evolved from geography databases to spatial analysis with the development of new techniques and applications, and the demand for strong spatial analysis functions in GISs has increased dramatically (Guo 1997). Previous spatial analysis functions previously focused mainly on GIS graphic techniques based on the theories of geometry and topography, such as overlay and buffer, and were poor for knowledge-based GISs.

Two branches are forming in this field: the combination of GIS and spatial analysis, of which Openshaw and Goodchild are representative, and spatial data mining in GIS, of which Jiawei Han and Deren Li are representative. The former looks at some actual research into the integration of spatial analysis and GIS, as well as the potential advantages in developing such integration (Fischer 1994). The latter focuses mainly on the application of data mining in GIS spatial databases from a knowledge discovery perspective. As Koperski et al. (1996) note, “Methods for mining spatial data should be combined with advanced spatial database, as well as statistical analysis, spatial reasoning and expert system technology to create Intelligent GIS systems” (Fotheringham and Rogerson 1994).

Because of the complexity and uncertainty of geographic phenomena, there are some problems in the absolutely quantitative method of data capture, application, and explanation of results; expert knowledge is needed to combine geographic analysis. Traditional expert system methods based on symbol inference are difficult to apply in practice because of a lack of knowledge about data capture and updating (Zhang 1999). Symbol intelligence is moving toward calculation intelligence in the artificial intelligence field. At the same time, from a data mining perspective, exploratory (statistical) and mathematical models of data in spatial analysis all can be considered sources of data mining. Thus, in this respect, the two research branches are similar.

The ocean, because of its large areas and limitations or difficulties of investigative methods, could not be cognized as a special field until recently. Current numerical modeling methods are unsuitable for some ocean processes or phenomena that cannot be understood by the mechanism. Because the ultimate goal of data mining or knowledge discovery is to discover hidden patterns or trends in complex information sources (Deekshatulu et al. n.d.), data mining tools are suitable for use in discovering knowledge from a ocean data set. In this paper, the quantitative relationship between the ocean fishery and corresponding environmental factors is given based on the combination of data mining and GIS. The spatio-temporal patterns have been extracted from a time series of fishery productivity statistic data and corresponding environment factors in the East China Sea (24–36˚ N, 118.5–130˚ E) from 1987 to 1998 using spatial isodata cluster algorithm and GIS mapping techniques in detail.

Cluster analysis is a branch of statistics that has been studied extensively for many years. The main advantage of using this technique is that interesting structures or clusters can be found directly from data without any background knowledge (Chen 1998). Recently, with the formation of data mining concepts and structure, cluster analysis has also become considered as a data mining tool. Spatial cluster analysis is one way to mine knowledge from data sets that have a spatial location. Like traditional cluster methods that take into account only attribution when analyzing geographic data, the single spatial cluster takes into account only location when calculating the distance of a sample. These two conditions cannot be used for practical applications (Guo 1997).