clospan
所属分类:数学计算
开发工具:Visual C++
文件大小:1317KB
下载次数:77
上传日期:2008-06-08 21:38:30
上 传 者:
fanjun1223
说明: 数据挖掘中的序列模式挖掘算法clospan的C++实现
(Data Mining Sequential Pattern Mining Algorithm clospan the C++ Realize)
文件列表:
clospan\bin\clospan (449330, 2005-10-27)
clospan\bin\clospan.exe (32768, 2005-10-27)
clospan\bin\clospan_debug.exe (147581, 2005-10-27)
clospan\bin\seq_data_generator (2775166, 2005-10-29)
clospan\bin\seq_data_generator.exe (98304, 2005-11-03)
clospan\bin\stlport_vc646.dll (827392, 2005-10-27)
clospan\bin\stlport_vc6_stldebug46.dll (831552, 2005-10-27)
clospan\Makefile (474, 2005-10-27)
clospan\msvc_proj\clospan.dsp (5370, 2005-10-27)
clospan\msvc_proj\clospan.dsw (508, 2005-10-27)
clospan\src\1LPrefixSpan.cpp (11776, 2005-10-27)
clospan\src\Global.cpp (3144, 2005-10-27)
clospan\src\Global.h (2269, 2005-10-27)
clospan\src\LinuxMemMap.cpp (2569, 2005-10-27)
clospan\src\MemMap.h (622, 2005-10-27)
clospan\src\NTMemMap.cpp (3347, 2005-10-27)
clospan\src\ProjDB.cpp (28127, 2005-10-27)
clospan\src\ProjDB.h (3272, 2005-10-27)
clospan\src\SeqTree\ClosedSeqTree.cpp (34597, 2005-10-27)
clospan\src\SeqTree\ClosedSeqTree.h (4218, 2005-10-27)
clospan\src\SeqTree\ClosedTree.cpp (15484, 2005-10-27)
clospan\src\SeqTree\ClosedTree.h (2753, 2005-10-27)
clospan\src\SeqTree\MaxSeqTree.cpp (24640, 2005-10-27)
clospan\src\SeqTree\MaxSeqTree.h (4837, 2005-10-27)
clospan\src\SeqTree\SeqTree.cpp (20610, 2005-10-27)
clospan\src\SeqTree\SeqTree.h (3275, 2005-10-27)
clospan\www\finished.html (564, 2005-12-08)
clospan\www\index.html (2574, 2005-10-27)
clospan\www\running.html (591, 2005-11-03)
clospan\src\SeqTree (0, 2008-06-08)
clospan\bin (0, 2008-06-08)
clospan\msvc_proj (0, 2008-06-08)
clospan\src (0, 2008-06-08)
clospan\www (0, 2008-06-08)
clospan (0, 2008-06-08)
CloSpan: Mining Closed Sequential Patterns
Author: Xifeng Yan, University of Illinois at Urbana-Champaign
The program is built upon PrefixSpan source code, "PrefixSpan:
Mining Sequential Patterns Efficiently by Prefix-Projected Pattern
Growth" Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Helen Pinto,
Qiming Chen, Umeshwar Dayal, Mei-Chun Hsu (ICDE'2001)
Contact: xyan@cs.uiuc.edu
Reference: "X. Yan, J. Han, R. Afshar, CloSpan: Mining Closed
Sequential Patterns in Large Databases, Proc. 2003 SIAM Int. Conf.
Data Mining (SDM'03)", 166 - 177, 2003.
NOTE:
For compiling under VC, please install stlport first.
How-To:
CloSpan filename min_sup num_of_labels
Parameters: (1) filename, your binary data (2) min_sup, the
minimum frequency of patterns (3) num_of_labels, the number of
distinct item labels.
Example:
CloSpan D10N1B.data 0.1 1000
It mines all frequent sequences from "D10N1B.data", each of which
should appear in at least 10% of the sequences in the dataset. 1000
means there are 1000 different symbols in this dataset.
Input Format:
1. The input is a set of sequences; each sequence has the following
format
<(item_11, item_12, ..., item_1n)(item_21, item_22, ... item_2m)...>
------------------------------ -----------------------------
transaction 1 transaction 2 ......
Example:
<(ab)(c)(d)>
<(e)(acfh)>
...
The input is stored in a binary file, we use a 4-byte integer "-1" to
separate transactions in each sequence and another 4-byte integer "-2"
to separate sequences in a dataset. Each of items is encoded using a
4-byte integer. For example, <(ab)(c)(d)><(e)(acfh)> is stored as
ab-1c-1d-1-2e-1acfh-1-2
where each symbol is a 4-byte integer and all of them are concatenated
together.
Output:
Program status as it is executing and the final results (such as timing)
are printed to stdout (console).
The discovered patterns are stored in a file named "ClosedPatterns",
which is in a format of plain text.
The first column in the output file shows the discovered patterns.
The second column in the output file is the number of times that a
pattern appears in the dataset.
近期下载者:
相关文件:
收藏者: