clospan

所属分类:数学计算
开发工具:Visual C++
文件大小:1317KB
下载次数:77
上传日期:2008-06-08 21:38:30
上 传 者fanjun1223
说明:  数据挖掘中的序列模式挖掘算法clospan的C++实现
(Data Mining Sequential Pattern Mining Algorithm clospan the C++ Realize)

文件列表:
clospan\bin\clospan (449330, 2005-10-27)
clospan\bin\clospan.exe (32768, 2005-10-27)
clospan\bin\clospan_debug.exe (147581, 2005-10-27)
clospan\bin\seq_data_generator (2775166, 2005-10-29)
clospan\bin\seq_data_generator.exe (98304, 2005-11-03)
clospan\bin\stlport_vc646.dll (827392, 2005-10-27)
clospan\bin\stlport_vc6_stldebug46.dll (831552, 2005-10-27)
clospan\Makefile (474, 2005-10-27)
clospan\msvc_proj\clospan.dsp (5370, 2005-10-27)
clospan\msvc_proj\clospan.dsw (508, 2005-10-27)
clospan\src\1LPrefixSpan.cpp (11776, 2005-10-27)
clospan\src\Global.cpp (3144, 2005-10-27)
clospan\src\Global.h (2269, 2005-10-27)
clospan\src\LinuxMemMap.cpp (2569, 2005-10-27)
clospan\src\MemMap.h (622, 2005-10-27)
clospan\src\NTMemMap.cpp (3347, 2005-10-27)
clospan\src\ProjDB.cpp (28127, 2005-10-27)
clospan\src\ProjDB.h (3272, 2005-10-27)
clospan\src\SeqTree\ClosedSeqTree.cpp (34597, 2005-10-27)
clospan\src\SeqTree\ClosedSeqTree.h (4218, 2005-10-27)
clospan\src\SeqTree\ClosedTree.cpp (15484, 2005-10-27)
clospan\src\SeqTree\ClosedTree.h (2753, 2005-10-27)
clospan\src\SeqTree\MaxSeqTree.cpp (24640, 2005-10-27)
clospan\src\SeqTree\MaxSeqTree.h (4837, 2005-10-27)
clospan\src\SeqTree\SeqTree.cpp (20610, 2005-10-27)
clospan\src\SeqTree\SeqTree.h (3275, 2005-10-27)
clospan\www\finished.html (564, 2005-12-08)
clospan\www\index.html (2574, 2005-10-27)
clospan\www\running.html (591, 2005-11-03)
clospan\src\SeqTree (0, 2008-06-08)
clospan\bin (0, 2008-06-08)
clospan\msvc_proj (0, 2008-06-08)
clospan\src (0, 2008-06-08)
clospan\www (0, 2008-06-08)
clospan (0, 2008-06-08)

CloSpan: Mining Closed Sequential Patterns Author: Xifeng Yan, University of Illinois at Urbana-Champaign The program is built upon PrefixSpan source code, "PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth" Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Helen Pinto, Qiming Chen, Umeshwar Dayal, Mei-Chun Hsu (ICDE'2001) Contact: xyan@cs.uiuc.edu Reference: "X. Yan, J. Han, R. Afshar, CloSpan: Mining Closed Sequential Patterns in Large Databases, Proc. 2003 SIAM Int. Conf. Data Mining (SDM'03)", 166 - 177, 2003. NOTE: For compiling under VC, please install stlport first. How-To: CloSpan filename min_sup num_of_labels Parameters: (1) filename, your binary data (2) min_sup, the minimum frequency of patterns (3) num_of_labels, the number of distinct item labels. Example: CloSpan D10N1B.data 0.1 1000 It mines all frequent sequences from "D10N1B.data", each of which should appear in at least 10% of the sequences in the dataset. 1000 means there are 1000 different symbols in this dataset. Input Format: 1. The input is a set of sequences; each sequence has the following format <(item_11, item_12, ..., item_1n)(item_21, item_22, ... item_2m)...> ------------------------------ ----------------------------- transaction 1 transaction 2 ...... Example: <(ab)(c)(d)> <(e)(acfh)> ... The input is stored in a binary file, we use a 4-byte integer "-1" to separate transactions in each sequence and another 4-byte integer "-2" to separate sequences in a dataset. Each of items is encoded using a 4-byte integer. For example, <(ab)(c)(d)><(e)(acfh)> is stored as ab-1c-1d-1-2e-1acfh-1-2 where each symbol is a 4-byte integer and all of them are concatenated together. Output: Program status as it is executing and the final results (such as timing) are printed to stdout (console). The discovered patterns are stored in a file named "ClosedPatterns", which is in a format of plain text. The first column in the output file shows the discovered patterns. The second column in the output file is the number of times that a pattern appears in the dataset.

近期下载者

相关文件


收藏者