School of Computer Science and Engineering The University of New South Wales Sydney 2052, Australia
Research Areas
Research Topics:
XML and Semi-Structured Data
Data Mining
Data Warehouse and OLAP
Database Query Processing
Information Extraction
Information Retrieval/Filtering
Plagiarism Detection
Spatial Database Systems
Web Searching
Database Query Processing
Data Mining
Publications
Top-k Set Similarity Joins C Xiao, W Wang, X Lin, H Shang, Proceedings of the 25th International Conference on Data Engineering, Patrick Kellenberger. IEEE Computer Society, 2009
Probabilistic Skyline Operator over Sliding Windows W Zhang, X Lin, Y Zhang, W Wang, J Yu, Proceedings of the 25th International Conference on Data Engineering, Patrick Kellenberger. IEEE Computer Society, 2009
Lazy Updates: An Efficient Technique to Continuously Monitoring Reverse kNN M Cheema, X Lin, Y Zhang, w wang, w Zhang, PVLDB, . , 2009
Keyword Search on Structure and Semi-structured Data Y Chen, W Wang, Z Liu, X Lin, AGM SIGMOD/PODS Conference 2009, Ugur Cetintemel, Stanley Zdonik, Donald Kossmann, Nesime Tatbul. ACM, 2009
INFORMATIVE FREQUENT ASSEMBLED FEATURE FOR FACE DETECTION B Zhang, G Ye, Y Wang, W Wang, J Xu, G Herman, J Yang, , . IEEE, 2009
Efficient Approximate Entity Extraction with Edit Distance Constraints W Wang, C Xiao, X Lin, C Zhang, AGM SIGMOD/PODS Conference 2009, Ugur Cetintemel, Stanley Zdonik, Donald Kossmann, Nesime Tatbul. ACM, 2009
Effective snippet clustering with domain knowledge S Patro, W Wang, Proceedings - 2009 1st International Conference on Advances in Databases, Knowledge, and Data Applications, DBKDA 2009, . , 2009
SPARK: A keyword search engine on relational databases Y Luo, W Wang, X Lin, 2008 IEEE 24th International Conference on Data Engineering, ICDE 2008, . , 2008
Efficient similarity joins for near duplicate detection W Wang, X Lin, J Yu, C Xiao, 17th international world wide web conference, J. Huai, et al.. , 2008
EdJoin: an efficient algorithm for similarity joins with edit distance constraints C Xiao, W Wang, X Lin, 34th International conference on Very Large Databases, H. Jagadish, et al.. , 2008
Coding-based Join Algorithms for Structural Queries on Graph-Structured XML Document H Wang, J Li, W Wang, X Lin, World Wide Web - Internet and Web Information Systems, . Springer Netherlands, 2008, 510
Spark: Top-K Keyword Query In Relational Databases Y Luo, X Lin, W Wang, X Zhou, SIGMOD 2007, Proceedings, C. Chan, et al.. Association for Computing Machinery, New York, NY 10036-5701, United States, China, 2007, pp. 115 - 126 [More Info]
With the increasing amount of text data stored in relational databases, there is a demand for RDBMS to support keyword queries over text data. As a search result is often assembled from multiple relational tables, traditional IR-style ranking and query evaluation methods cannot be applied directly. In this paper, we study the effectiveness and the efficiency issues of answering top-k keyword query in relational database systems. We propose a new ranking formula by adapting existing IR techniques based on a natural notion of virtual document. Compared with previous approaches, our new ranking method is simple yet effective, and agrees with human perceptions. We also study efficient query processing methods for the new ranking method, and propose algorithms that have minimal accesses to the database. We have conducted extensive experiments on large-scale real databases using two popular RDBMSs. The experimental results demonstrate significant improvement to the alternative approaches in terms of retrieval effectiveness and efficiency. Copyright 2007 ACM.
Approximate Range-Sum Query Answering On Data Cubes With Probabilistic Guarantees A Cuzzocrea, W Wang, Journal of Intelligent Information Systems, . Springer, Dordrecht, 2007, pp. 161 - 197 [More Info]
Approximate range aggregate queries are one of the most frequent and useful kinds of queries for Decision Support Systems (DSS), as they are widely used in many data analysis tasks. Traditionally, sampling-based techniques have been proposed to tackle this problem. However, their effectiveness degrade when the underlying data distribution is skewed. Another approach based on the outlier management can limit the effect of data skews but fails to address other requirements of approximate range aggregate queries, such as error guarantees and query processing efficiency. In this paper, we present a technique that provides approximate answers to range aggregate queries on OLAP data cubes efficiently, with theoretical guarantees on the errors. Our basic idea is to build different data structures to manage outliers and the rest of the data. Carefully chosen outliers are organized in a quad-tree based indexing data structure to provide efficient access for query processing. A query-workload adaptive, tree-like synopsis data structure, called T unable P artition-Tree (TP-Tree), is proposed to organize samples extracted from non-outlier data. Our experiments clearly demonstrate the merits of our technique, by comparing with previous well-known techniques.
Visual Specification And Optimization Of Xquery Using Vxq R Choi, R Wong, W Wang, Database and Expert Systems applications, 2006, S. Bressan, et al. Springer, Poland, 2006, pp. 161 - 171
Towards Multidimensional Subspace Skyline Analysis J Pei, Y Yuan, X Lin, W Jin, M Ester, Q Liu, W Wang, Y Tao, J Yu, Q Zhang, ACM transactions on database systems, . ACM press, New York, USA, 2006, pp. 1335 - 1381
Space-Efficient Relative Error Order Sketch Over Data Streams Y Zhang, X Lin, J Xu, F Korn, W Wang, 22nd international conference on data engineering, Proceedings, L. Liu, A. Reuter, et al.. IEEE computer society, Losa Alamitos, CA, USA, 2006, pp. 51 - 52
Efficient Computation Of K-Medians Over Data Streams Under Memory Constraints Z Chong, J Yu, Z Zhang, X Lin, W Wang, A Zhou, Journal of Computer Science and Technology, . Science Press, Beijing, 2006, pp. 284 - 296 [More Info]
In this paper, we study the problem of efficiently computing k-medians over high-dimensional and high speed data streams. The focus of this paper is on the issue of minimizing CPU time to handle high speed data streams on top of the requirements of high accuracy and small memory. Our work is motivated by the following observation: the existing algorithms have similar approximation behaviors in practice, even though they make noticeably different worst case theoretical guarantees. The underlying reason is that in order to achieve high approximation level with the smallest possible memory, they need rather complex techniques to maintain a sketch, along time dimension, by using some existing off-line clustering algorithms. Those clustering algorithms cannot guarantee the optimal clustering result over data segments in a data stream but accumulate errors over segments, which makes most algorithms behave the same in terms of approximation level, in practice. We propose a new grid-based approach which divides the entire data set into cells (not along time dimension). We can achieve high approximation level based on a novel concept called (1 - epsilon)-dominant. We further extend the method to the data stream context, by leveraging a density-based heuristic and frequent item mining techniques over data streams. We only need to apply an existing clustering once to computing k-medians, on demand, which reduces CPU time significantly. We conducted extensive experimental studies, and show that our approaches outperform other well-known approaches.
Term Graph Model For Text Classification W Wang, X Lin, D Do, Advanced data mining and applications, First international conference, . Springer-Verlag Berlin, Berlin, 2005, pp. 19 - 30
Subgraph Join: Efficient Processing Subgraph Queries On Graph-Structured Xml Document H Wang, W Wang, X Lin, J Li, Advances in web-age information management, 6th international conference, . Springer-Verlag Berlin, Berlin, 2005, pp. 68 - 80
Stabbing The Sky: Efficient Skyline Computation Over Sliding Windows X Lin, Y Yuan, W Wang, H Lu, Proceedings of the 21st international conference on data engineering, . IEEE, USA, 2005, pp. 502 - 513
Similarity Search With Implicit Object Features Y Luo, Z Liu, X Lin, W Wang, J Yu, Advances in web-age information management, 6th international conference, . Springer-Verlag Berlin, Berlin, 2005, pp. 150 - 161
Practical Indexing Xml Document For Twig Query H Wang, W Wang, J Li, X Lin, R Wong, Advances in computer science - ASIAN 2005, Data management on the web, 10th Asian computing science conference, . Springer, Germany, 2005, pp. 208 - 222
Locating Motifs In Time-Series Data Z Liu, J Yu, X Lin, W Wang, H Lu, Advances in knowledge discovery and data mining, . Springer, Germany, 2005, pp. 343 - 353
Labeling Scheme And Structural Joins For Graph-Structured Xml Data W Wang, X Lin, J Li, H Wang, Web technologies research and development, 7th Asia-Pacific conference, . Springer-Verlag Berlin, Berlin, Germany, 2005, pp. 277 - 289
Efficient Processing Of Xml Path Queries Using The Disk-Based F&B Index W Wang, H Wang, H Lu, H Jian, X Lin, J Li, The 31st International conference on very large databases, . ACM, USA, 2005, pp. 145 - 156
Efficient Computation Of The Skyline Cube Y Yuan, X Lin, Q Liu, W Wang, J Yu, Q Zhang, Proceedings of the 31st international conference on very large databases, . ACM, USA, 2005, pp. 241 - 252
Answering Approximate Range Aggregate Queries On Olap Data Cubes Withprobabilistic Guarantees A Cuzzocrea, W Wang, U Matrangolo, DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, . SPRINGER-VERLAG BERLIN, BERLIN, 2004, pp. 97 - 107