jssxrance
所属分类:数据结构
开发工具:MultiPlatform
文件大小:1194KB
下载次数:1
上传日期:2018-01-07 17:55:09
上 传 者:
permqe
说明: 挖掘频繁闭序列的算法是序列挖掘算法早期比较著名的算法
文件列表:
www\finished.html (564, 2005-12-07)
www\index.html (2574, 2005-10-26)
www\running.html (591, 2005-11-03)
src\SeqTree\ClosedSeqTree.h (4218, 2005-10-26)
src\SeqTree\ClosedTree.h (2753, 2005-10-26)
src\Global.h (2269, 2005-10-26)
src\SeqTree\MaxSeqTree.h (4837, 2005-10-26)
src\MemMap.h (622, 2005-10-26)
src\ProjDB.h (3272, 2005-10-26)
src\SeqTree\SeqTree.h (3275, 2005-10-26)
src\1LPrefixSpan.cpp (11776, 2005-10-26)
src\SeqTree\ClosedSeqTree.cpp (34597, 2005-10-26)
src\SeqTree\ClosedTree.cpp (15484, 2005-10-26)
src\Global.cpp (3144, 2005-10-26)
src\LinuxMemMap.cpp (2569, 2005-10-26)
src\SeqTree\MaxSeqTree.cpp (24640, 2005-10-26)
src\NTMemMap.cpp (3347, 2005-10-26)
src\ProjDB.cpp (28127, 2005-10-26)
src\SeqTree\SeqTree.cpp (20610, 2005-10-26)
lbin\clospan.exe (32768, 2005-10-26)
lbin\clospan_debug.exe (147581, 2005-10-26)
lbin\seq_data_generator.exe (98304, 2005-11-03)
lbin\stlport_vc6_stldebug46.dll (831552, 2005-10-26)
lbin\stlport_vc646.dll (827392, 2005-10-26)
lbin\clospan (449330, 2005-10-26)
Makefile (474, 2005-10-26)
lbin\seq_data_generator (2775166, 2005-10-28)
msvc_proj\clospan.dsp (5370, 2005-10-26)
msvc_proj\clospan.dsw (508, 2005-10-26)
src\SeqTree (0, 2017-12-01)
lbin (0, 2017-12-01)
msvc_proj (0, 2017-12-01)
src (0, 2017-12-01)
www (0, 2017-12-01)
CloSpan: Mining Closed Sequential Patterns
Author: Xifeng Yan, University of Illinois at Urbana-Champaign
The program is built upon PrefixSpan source code, "PrefixSpan:
Mining Sequential Patterns Efficiently by Prefix-Projected Pattern
Growth" Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Helen Pinto,
Qiming Chen, Umeshwar Dayal, Mei-Chun Hsu (ICDE'2001)
Contact: xyan@cs.uiuc.edu
Reference: "X. Yan, J. Han, R. Afshar, CloSpan: Mining Closed
Sequential Patterns in Large Databases, Proc. 2003 SIAM Int. Conf.
Data Mining (SDM'03)", 166 - 177, 2003.
NOTE:
For compiling under VC, please install stlport first.
How-To:
CloSpan filename min_sup num_of_labels
Parameters: (1) filename, your binary data (2) min_sup, the
minimum frequency of patterns (3) num_of_labels, the number of
distinct item labels.
Example:
CloSpan D10N1B.data 0.1 1000
It mines all frequent sequences from "D10N1B.data", each of which
should appear in at least 10% of the sequences in the dataset. 1000
means there are 1000 different symbols in this dataset.
Input Format:
1. The input is a set of sequences; each sequence has the following
format
<(item_11, item_12, ..., item_1n)(item_21, item_22, ... item_2m)...>
------------------------------ -----------------------------
transaction 1 transaction 2 ......
Example:
<(ab)(c)(d)>
<(e)(acfh)>
...
The input is stored in a binary file, we use a 4-byte integer "-1" to
separate transactions in each sequence and another 4-byte integer "-2"
to separate sequences in a dataset. Each of items is encoded using a
4-byte integer. For example, <(ab)(c)(d)><(e)(acfh)> is stored as
ab-1c-1d-1-2e-1acfh-1-2
where each symbol is a 4-byte integer and all of them are concatenated
together.
Output:
Program status as it is executing and the final results (such as timing)
are printed to stdout (console).
The discovered patterns are stored in a file named "ClosedPatterns",
which is in a format of plain text.
The first column in the output file shows the discovered patterns.
The second column in the output file is the number of times that a
pattern appears in the dataset.
近期下载者:
相关文件:
收藏者: