ISSN 1000-1239 CN 11-1777/TP

Table of Content

01 September 2015, Volume 52 Issue 9
Truth Discovery Based Credibility of Data Categories on Data Sources
Ma Ruxia, Meng Xiaofeng
2015, 52(9):  1931-1940.  doi:10.7544/issn1000-1239.2015.20140684
Asbtract ( 1308 )   HTML ( 1)   PDF (2043KB) ( 1224 )  
Related Articles | Metrics
The popularization of the network and the development of e-commerce have changed the way people access information and consume. For most of people, Web has been the important source of information. Meanwhile, information quality issue is becoming increasingly prominent. There is a lot of information which is outdated, incorrect, false and bias. Particularly, the problem of conflicting information provided by different websites is obvious. It has to be solved that how to find the truth from conflicting information. As we know, there is not a method which considers the credibility of data categories on data sources during discovering truth. So, we propose a problem which is truth discovery based credibility of data categories on data sources. In this paper, two methods are proposed to detect the credibility differences of data categories on sources, and a Bayesian method is used to iteratively compute the data sources quality and data accuracy. Additional, data coverage and the difficulty of each object is considered to improve the accuracy of truth finding. The experiments on a real data set show that our algorithms can significantly improve the accuracy of truth discovery.
Mass of Short Texts Clustering and Topic Extraction Based on Frequent Itemsets
Peng Min, Huang Jiajia, Zhu Jiahui, Huang Jimin, Liu Jiping
2015, 52(9):  1941-1953.  doi:10.7544/issn1000-1239.2015.20140533
Asbtract ( 1634 )   HTML ( 5)   PDF (1801KB) ( 1825 )  
Related Articles | Metrics
Short texts generated in social media have the characteristics of volume, velocity, low quality and variety, thus make the vector-space-based clustering methods face the challenges of high-dimensions, features sparsity and noisy disturbing. In this paper, we propose a short texts clustering and topic extraction (STC-TE) framework based on the frequent itemsets mined from the texts. This framework firstly studies the impact of multi-features on the short texts’ quality. Then, a large amount of frequent itemsets are dug out from the high quality short text set via setting a low support level, and a similar itemsets filtering strategy is devised to discard most of the unimportant frequent itemsets. Furthermore, based on the frequent itemsets similarity evaluated by relevant texts, we proposed a cluster self-adaptive spectral clustering (CSA_SC) algorithm to form the itemsets into different topic clusters. At last, the large-scale of short texts are classified into associated clusters according to the topic words extracted from the frequent itemset clusters. The framework is tested on one million of SinaWeibo dataset to evaluate the performance of the important frequent itemset selection and clustering, the topic words extraction, and the large scale of short texts classification. Experimental results show that the STC-TE framework can achieve topic extraction and large-scale short texts clustering with high accuracy.
Semantic-Enhanced Spatial Keyword Search
Han Jun, Fan Ju, Zhou Lizhu
2015, 52(9):  1954-1964.  doi:10.7544/issn1000-1239.2015.20140686
Asbtract ( 954 )   HTML ( 2)   PDF (2421KB) ( 617 )  
Related Articles | Metrics
Spatial keyword search finds the points-of-interest (POIs) which are not only relevant to users’ query intent, but also close to query location. Spatial keyword search has many important applications, such as map search. Previous methods for spatial keyword search have the limitation that they only consider textual relevance of POIs to query keywords, and neglect the semantics of queries. So these methods may not be able to return relevant results or return many irrelevant results. To address this problem, this paper introduces a semantic-enhanced spatial keyword search method, named S3(semantic-enhanced spatial keyword search). Given a query, S3 analyzes the semantics of the query keywords to measure semantic distances of POIs to the query. Then, it utilizes a novel POI ranking mechanism by combining both semantic and spatial distance for effective POI search. S3 has the following challenges. Firstly, S3 introduces knowledge bases to help capture query semantics and introduces a ranking scoring function that considers both semantic distance and spatial distance. Secondly, it calls for instant search on large-scale POI data sets. To address this challenge, we devise a novel index structure GRTree, and develop some effective pruning techniques based on this structure. The extensive experiments on a real dataset show that S3 not only produces high-quality results, but also has good efficiency and scalability.
A Nash-Pareto Strategy Based Automatic Data Distribution Method and Its Supporting Tool
Wang Xiaoyan, Chen Jinchuan, Guo Xiaoyan, Du Xiaoyong
2015, 52(9):  1965-1975.  doi:10.7544/issn1000-1239.2015.20140832
Asbtract ( 720 )   HTML ( 0)   PDF (4497KB) ( 616 )  
Related Articles | Metrics
The era of big data brings new challenges in the field of data storage and management. With the dramatic increase of data volume, automatic data distribution has been one of the key techniques and intractable problem for distributed systems. Based on the studies on data, workload and node in this field, this work abstracts the problem of data distribution as a triangle model called DaWN (data, workload, node), and summarizes their relationships with each other as data fragmentation, data allocation and workload processing. According to DaWN, it proposes an automatic solution for data distribution in large-scale on-line transaction processing (OLTP) applications, and discusses the details and interactions of each module in this consolidation architecture. Combined with our existing research, it applies the optimal equilibrium conduct of Nash-Pareto strategy into practice. According to the results of a series of experiments, the proposedapproach shows nice overall performance and effectiveness. Meanwhile, this work also implements a prototype tool called ADDvisor for automatic data distribution supporting in the expect of smoothly promoting more research work into real world practice and effectively coordinating automatic data distribution in large scale OLTP distributed applications.
Similarity Query Processing Algorithm over Data Stream Based on LCSS
Wang Shaopeng, Wen Yingyou, Zhao Hong
2015, 52(9):  1976-1991.  doi:10.7544/issn1000-1239.2015.20140479
Asbtract ( 986 )   HTML ( 0)   PDF (5482KB) ( 697 )  
Related Articles | Metrics
Nowadays similarity query about data stream has been essential in many applications, like smart home and environmental monitoring. However, few of the current relevant researches take LCSS (longest common subsequence) as the similarity measurement function. The NAIVE algorithm gets the query results by comparing the threshold and the value of measurement function which is obtained based on the basic dynamic programming method. The similarity query over data stream based on the LCSS is considered in this paper. The D2S-PC algorithm is proposed to overcome the drawback that the query result cannot be gotten until the calculations on all the elements in the full dynamic programming matrix are finished. It defines the PS and CC domains of the matrix over every window, and utilizes the characteristics of the similarity query and matrix members in these two domains effectively. By taking this algorithm, the similarity query results can be obtained before the final length of LCSS is calculated. Compared with the original algorithm, it reduces the computations about the members in the matrix greatly. Extensive experiments on real and synthetic datasets show that the D2S-PC algorithm is effective in handling the similarity query over data stream based on the LCSS in the condition of more precise query results, and can meet the requirements of practical applications.
Algorithms for Improving Data Currency
Li Mohan, Li Jianzhong
2015, 52(9):  1992-2001.  doi:10.7544/issn1000-1239.2015.20140687
Asbtract ( 882 )   HTML ( 0)   PDF (2027KB) ( 823 )  
Related Articles | Metrics
Fixing obsolete data to latest values is a common challenge when improving data quality. Previous methods of data repairing can be divided into two categories, that is, the methods based on quality rules and the methods based on statistic techniques. The former can express the domain knowledge, but fall short in their ability to detect and represent some complex relationships of data. The latter can fix some errors that quality rules cannot detect or repair, but the current methods need to learn complex conditional probability distribution, and they cannot involve domain knowledge effectively. To overcome the shortages of the above two kinds of methods, this paper focuses on combining quality rules and statistical techniques to improve data currency. A new class of rules for repairing data currency is proposed. Domain knowledge can be directly expressed by the antecedents and consequents of rules, and the statistical information can be described by the distribution tables corresponding to each rules. Based on these rules, the algorithms for learning repairing rules and fixing obsolete data are provided. The experiments based on both real and synthetic data prove the efficiency and effectiveness of the methods.
Index of Indoor Moving Objects for Multiple Queries
Ben Tingting, Qin Xiaolin, Xu Jianqiu
2015, 52(9):  2002-2013.  doi:10.7544/issn1000-1239.2015.20131230
Asbtract ( 777 )   HTML ( 1)   PDF (3322KB) ( 461 )  
Related Articles | Metrics
Moving object index is widely used in location-based services. Since people spend large parts of their lives in indoor spaces (e.g. hospitals, shopping malls, subway systems, etc.), effective management of indoor mobile data becomes very important. Existing indoor moving object indices focus on historical data queries, and only one type of queries is supported. In this paper, we propose a novel index, called MQII (multiple queries indoor index), which supports not only history queries and present queries, but also object queries and range queries. MQII is based on graph-based model, and can index two aspects with the object list and bucket list structure, such as the object and spatial-temporal scales. In order to improve the query performance, we present a RFID (radio frequency identification) data preprocessing method to reduce the size of the input data sets for MQII. Furthermore, effective update and query algorithms are developed. Experimental results show that compared with existing indoor moving object indices, the data preprocessing can reduce the amount of data. In addition, the index we proposed not only supports history queries and present queries, but also provides efficient object location queries, trajectory queries and range queries. This method can be used in various indoor spaces such as office buildings, hospitals and hotels.
Calculation Results Characteristics Extract and Reuse Strategy Based on Hive
Xie Heng, Wang Mei, Le Jiajin, Sun Li
2015, 52(9):  2014-2024.  doi:10.7544/issn1000-1239.2015.20140548
Asbtract ( 923 )   HTML ( 1)   PDF (2583KB) ( 633 )  
Related Articles | Metrics
Jobs in MapReduce workflow need to materialize intermediate data into HDFS (Hadoop distributed file system), which causes a large amount of I/O overhead and low efficiency. Based on existing representative work Hive, this paper proposes a strategy to match and reuse the MapReduce calculation results by extracting and storing the characteristics of the results. Firstly, we define Join-Graph, Join-Object and other structures according to the query condition, which can be used to find reusable results. Based on the abstract syntax tree generated by HiveQL (Hive query language) parser, an algorithm is proposed to generate Join-Object of the query. Followed by traversing the candidate Join-Object list, an algorithm is provided to generate the best reuse solution including single Join-Object and multiple Join-Objects reuse. In addition, we provide three methods to increase the reuse probability, including multi-key selection, arithmetic delay and semantic understanding. Finally, we conduct the experiments using TPC-H and SSB benchmarks. The results show that the efficiency is improved by 28%-52% when reusing single Join-Object by TPC-H, while it is improved by up to 75% when reusing multiple Join-Objects, and the efficiency of all the 22 queries is improved by 15.7% on average. By SSB, the efficiency is improved by 40% to 76%, 55% on average.
An Efficient Filtering-and-Refining Retrieval Method for Big Audio Data
Zhang Xingzhong, Wang Yunsheng, Zeng Zhi, Niu Baoning
2015, 52(9):  2025-2032.  doi:10.7544/issn1000-1239.2015.20140694
Asbtract ( 877 )   HTML ( 0)   PDF (2118KB) ( 474 )  
Related Articles | Metrics
Fast audio retrieval is demanding due to the high dimension nature and increasingly larger volume of audios in the Internet. Although audio fingerprinting can greatly reduce its dimension while keeping audio identifiable, the dimension of audio fingerprints is still too high to scale up for big audio data. The number of audios to be checked has to be small enough. This paper proposes a robust and fast audio retrieval method for big audio data, which combines audio fingerprinting with filtering-and-refining method. An audio middle fingerprint is devised with considerable small dimension for quickly filtering most likely audios, by applying bag-of-features(BoF) technique on the classical Philips audio fingerprint, which can reduce the search scope with a 130 times speed gain compared with the Fibonacci Hashing retrieval. A matching algorithm is developed to reduce the computational complexity by comparing the samples at fixed interval of two audios with thresholds, which results in a maximal speed gain of 140 times. Experimental results show that the average time of retrieving audio clips of different length in about 100000 audios is less than 1s. After applying MP3 conversion, resampling, and random shearing, the recall rates are all above 99.47%, and the theoretical accuracy is close to 100%.
Top-k Medical Images Query Based on Association Graph
Li Pengyuan, Pan Haiwei, Li Qing, Han Qilong, Xie Xiaoqin, Zhang Zhiqiang
2015, 52(9):  2033-2045.  doi:10.7544/issn1000-1239.2015.20140692
Asbtract ( 794 )   HTML ( 1)   PDF (3007KB) ( 510 )  
Related Articles | Metrics
Patient-to-patient comparison, especially image-to-image comparison plays an important role in the medical domain since doctors invariably make diagnoses based on prior experiences of similar cases. It is very significant for doctors to find similar medical images from the database as similar pathological changes in prior patients’ images and corresponding reports can assist doctors to make diagnoses for current patients. Therefore, advanced medical image retrieval techniques have been widely studied to improve the accuracy in recent years. However, the processing time has become another problem in medical image retrieval domain because of the increasing number of medical images. As doctors are only interested in the most similar k results, a novel model of association graph is proposed for medical image top-k query in this paper. The fuzzy expression in a association graph can describe the similarity between images effectively. Moreover, a series of correlation measurements are proposed for similarity reasoning. Then the medical image top-k query method is represented based on the characters of correlation measurements. Furthermore, four walk strategies are studied to accelerate and stabilize the top-k process. Experimental results show that its efficiency and effectiveness are higher in comparison with state of the art.
Performance Analysis and Optimization for In-Network Caching Replacement in Information Centric Networking
Wang Yonggong, Li Zhenyu, Wu Qinghua, Xie Gaogang
2015, 52(9):  2046-2055.  doi:10.7544/issn1000-1239.2015.20140101
Asbtract ( 872 )   HTML ( 2)   PDF (3260KB) ( 618 )  
Related Articles | Metrics
Information centric networking (ICN) is a promising framework for evolving the current network architecture, advocating the ubiquitous in-network caching to enhance content delivery. Consequently, the cache replacement mechanism has been a hot topic in ICN research. In this paper, we first study the performance of the de facto standard cache replacement policy—least recently used (LRU). We find that if an interest for certain content is not satisfied at the first LRU cache node it hits, it is hardly satisfied in the following path. We then propose a pre-filtering based cache replacement policy to mitigate the cache degradation in multi-hop LRU cache. In the proposed policy, a pre-filtering LRU cache is settled in front of the real content store, which filters out the non-popular content and improves the hit-ratio of the real content cache. Extensive experiments based on the real-life topology show that our pre-filtering cache policy greatly improves the cache hit-ratio of cache node in typical ICN scenarios.
A Dynamic Network Risk Assessment Model Based on Attacker’s Inclination
Ma Chunguang, Wang Chenghong, Zhang Donghong, Li Yingtao
2015, 52(9):  2056-2068.  doi:10.7544/issn1000-1239.2015.20140177
Asbtract ( 928 )   HTML ( 2)   PDF (3642KB) ( 806 )  
Related Articles | Metrics
This article proposes a new dynamic network risk analysis model based on attackers’ inclination in order to solve some problems of the traditional risk analysis method based on attack graph. Traditional attack graph based on risk assessment method relies highly on the known vulnerability database and only analyzes the atomic attacks’ attribute regardless of the relationship between attack strategies and attackers’ inclination. In our model we take both the existing vulnerabilities and unknown threatens into consideration, then evaluate the attackers pressures during different attack periods so that we quantize the attackers’ inclination dynamically under the network environment. Then, we add the attackers’ inclination factors and atomic attack attributes into graph based risk assessment model, and we create a new type of attack graph using attackers’ inclination factors. Finally we set up a dynamic risk assessment method by using Bayesian reasoning engine. We convert our static attack to the dynamic Bayesian attack graph, and use the posterior probability computed by Bayesian reasoning engine to realize the dynamic risk assessment. We establish a real-world experiment environment to simulate our dynamic risk assessment model based on attackers’ inclination and validate its function. Experimental results demonstrate the rationality of this model, and prove that this system is more suitable for real-time risk assignment under the actual network environment.
Research on the Virtualization of Future Internet
Yu Tao, Bi Jun, Wu Jianping
2015, 52(9):  2069-2082.  doi:10.7544/issn1000-1239.2015.20140207
Asbtract ( 880 )   HTML ( 4)   PDF (3254KB) ( 641 )  
Related Articles | Metrics
Current Internet is designed based on end to end principle and co-established by lots of Internet service providers with different objects and policies. In order to update Internet architecture, it needs to reach a consensus among almost all of them, which makes the direct deployment of radically new architecture and protocols on Internet nearly impossible. To fend off the ossification of Internet architecture, network virtualization is proposed to add diversity in the future Internet. By introducing various architectures on a common substrate network, Internet virtualization can promote Internet innovation and encourage the emergence of huge kinds of new applications. In this paper we firstly describe the application context of network virtualization and the general virtualization method of traditional networks, then setup a classification framework for various related researches on Internet virtualization. After that we explore them one by one from the perspectives of both Internet architecture and test bed, and summarize the development trend of research on future Internet virtualization. We conclude that network virtualization is an indispensable part of future Internet and takes two roles in it. One is for underlay network, the other is for each future Internet paradigm, and both of them are required for future Internet to realize its design goals.
Analysis of Maximum Steady-State Throughput for Temperature-Constrained Multicore Processors
Zhang Biying, Chen Hongsong, Cui Gang, Fu Zhongchuan
2015, 52(9):  2083-2093.  doi:10.7544/issn1000-1239.2015.20140656
Asbtract ( 737 )   HTML ( 0)   PDF (3480KB) ( 384 )  
Related Articles | Metrics
With the increasing power density of multicore processors, the temperature-constrained performance analysis becomes a key component for the early design optimization of multicore processors. When different tasks are running, the temperatures of processors exhibit significant variation. However, most of existing researches for the steady-state analysis are based on the assumption that all tasks have the same power dissipation and distribution, and do not consider the impact of task variation on the performance of thermal-aware multicore processors. In order to improve the analysis accuracy of the steady-state throughput of multicore processors under temperature constraint, the task variation is taken into account, and a new method of maximum throughput analysis is proposed based on the HotSpot thermal model for the multicore processors which use dynamic voltage and frequency scaling (DVFS) to dynamically manage temperature. The task characteristic is incorporated into the model of performance analysis, and the relationship among the characteristics of tasks on various cores is demonstrated when the multicore processors achieve maximum throughput. And then the analysis of maximum throughput of multicore processors under temperature constraint is transformed to the problem of linear programming. Experimental results show that the proposed method achieves better accuracy of analysis, and task characteristic has the significant impact on the maximum steady-state throughput of temperature-constrained multicore processors.
Low Power Scheduling Algorithm for Mix Tasks Based on Constant Bandwidth Server
Zhang Yiwen, Guo Ruifeng, Deng Changyi
2015, 52(9):  2094-2104.  doi:10.7544/issn1000-1239.2015.20140611
Asbtract ( 647 )   HTML ( 0)   PDF (2552KB) ( 401 )  
Related Articles | Metrics
We present a low power scheduling algorithm for mix tasks which is based on constant bandwidth server and orients to the mix tasks set in hard real-time systems, named CBSMTLPSA (constant bandwidth server mix task low power scheduling algorithm). The mix task set consists of the periodic tasks with the limit of deadlines and the aperiodic tasks with the requirement of the response time. The CBSMTLPSA algorithm, which combines the DVS (dynamic voltage scaling) technology with the DPM (dynamic power management) technology, is two phases algorithm. To take advantage of the processors resources, it should determine the offline speed of the task in the offline phase. It reclaims the slack time from already completed periodic tasks as well as from the server, and uses the DVS technology to adjust the speed of the processor to reduce the energy consumption in the online phase. In addition, in order to further reduce the energy consumption, it should determine whether the DPM technology is used to save energy, when the processor is in idle status. Simulation results show that the CBSMTLPSA algorithm consumes 6.02%-34.14% less energy than that of the CBS/DRA-W (constant bandwidth server for dynamic reclaim algorithm base workload) algorithm. The product of energy consumption and response time of aperiodic tasks of CBSMTLPSA algorithm is about 5.86%-34.06% lower than the CBS/DRA-W algorithm.
Cache Load Balancing Oriented Dynamic Binary Translation
Li Zhanhui, Liu Chang, Meng Jianyi, Yan Xiaolang
2015, 52(9):  2105-2113.  doi:10.7544/issn1000-1239.2015.20140220
Asbtract ( 629 )   HTML ( 1)   PDF (3193KB) ( 601 )  
Related Articles | Metrics
Based on the fact that the highly increasing load of instruction cache and data cache has led to great performance loss for DBT (dynamic binary translator), and the out-of-balance increasing rate between instruction cache and data cache makes the situation worse, this paper proposes a hardware-software-codesigned DBT acceleration mechanism that speeds up DBT performance by dynamically balancing load of instruction cache to data cache. The key idea of this mechanism is the design of the cache load balancing state for microprocessors. When microprocessor working in this state, the instruction cache stays the same as usual and the data cache is divided into two areas: normal-accessing-area and load-balancing-area. The normal-accessing-area caches regular program data just as the traditional data cache does. However, the load-balancing-area is quite different. It doesn’t cache regular program data, but supports load-transforming-channel, which is used to transform and assimilate most of the instruction cache load caused by scheduler of the DBT. Main work of the scheduler is converting jump target address from source machine code space to target machine code space. Experimental results based on EEMBC(embedded microprocessor benchmark consortium) benchmarks show that the access load of instruction cache is reduced by 35%, data cache is reduced by 58%, and overall performance of the QEMU(quick emulator) DBT is improved by 171%.
Chinese Zero Anaphora Resolution with Markov Logic
Song Yang, Wang Houfeng
2015, 52(9):  2114-2122.  doi:10.7544/issn1000-1239.2015.20140620
Asbtract ( 917 )   HTML ( 1)   PDF (1011KB) ( 647 )  
Related Articles | Metrics
Chinese zero anaphora resolution includes two subtasks: zero pronoun detection and zero anaphora resolution, which are correlated with each other. Zero pronoun detection means to recognize all the zero anaphors in a given text, which mainly include null subject or null object, and exist widely in Chinese, Japanese and Italian. Zero anaphora resolution means to determine the antecedent for each recognized zero anaphor, which has already appeared as a noun, pronoun or common noun phrase before the detected zero anaphora in the previous text. Traditional methods to solve Chinese zero anaphora resolution problem generally employ some common-used learning features to construct independent classifiers for zero pronoun detection and zero anaphora resolution, but it cannot capture association relationship between these two subtasks, e.g. recognized zero anaphora must be resolved or the one to be resolved must be zero anaphora and so on. In our method, these two subtasks are combined into a unified machine learning framework with Markov logic to make joint inference and joint learning. We use local formulas to describe zero pronoun detection and zero anaphora resolution respectively, and use global formulas to represent the association relationship between these two subtasks. We find that joint learning model which makes learning with inference can acquire more effective feature weights than independent learning model which just makes learning without inference. Experimental results on OntoNotes3.0 Chinese dataset show that our joint learning model can achieve better results compared with independent learning model and other baseline methods.
Difference Selection Strategy for Solving Complex Multi-Objective Problems
Zheng Jinhua, Liu Lei, Li Miqing, Yin Cheng, Wang Kang
2015, 52(9):  2123-2134.  doi:10.7544/issn1000-1239.2015.20140472
Asbtract ( 782 )   HTML ( 4)   PDF (4270KB) ( 678 )  
Related Articles | Metrics
Since the emergence of complex multi-objective problems in the finance and economics areas, dealing with multi-objective problems has gained increasing attention. How to improve the quality of generating solutions is the key in solving such problems. Although a number of MOEAs (multi-objective evolution algorithms) have been proposed over the last several years to solve the complex financial and economic multi-objective problems, not much effort has been made to deal with generating solutions in multi-objective optimization. Recently, we have suggested a MODEA_DACR (multi-objective difference evolution algorithm via dynamic allocation of computational resource) to improve the quality of generating solutions. The proposed algorithm uses two populations with different convergence rates to extract convergence information for the Pareto set, and then adjusts the parameter and difference evolution selection strategy according to the obtained convergence rate.In addition,based on the convergence rate of the population the proposed algorithm dynamically allocates the computational resources. The proposed algorithm is compared with two state-of-the-art algorithms, ε-MOEA and MOEA/D-DRA, on a suite of test problems with a complex Pareto set. Experimental results have shown the effectiveness of the proposed algorithm.
A Machine Learning Method for Histopathological Image Automatic Annotation
Zhang Gang, Zhong Ling, Huang Yonghui
2015, 52(9):  2135-2144.  doi:10.7544/issn1000-1239.2015.20140683
Asbtract ( 1431 )   HTML ( 4)   PDF (2767KB) ( 957 )  
Related Articles | Metrics
Histopathological image can reveal the reason and severity of diseases, which is important for clinical diagnosis. Automatic analysis of histopathological image may release doctor’s burden for manual annotation which can preserve more time for doctors to focus on special and difficult cases. However, the ambiguous relationship between local regions in a histopathological image and histopathological characteristics makes it difficult to construct a computer-aid model. An automatic annotation method for histopathological images based on multiple-instance multiple-label (MIML) learning is proposed, aiming at directly modeling the medical experience of doctors, which suggests that each annotated term associated with an image corresponds to a local visually recognized region. We propose a self-adaptive region cutting method with constraints, to segment each image into several visually disjoint regions, and then perform a feature extraction for each generated region based on texture and inner structures. The whole image is regarded as a bag and regions as instances, thus an image is expressed as a multiple-instance sample. Then we propose a sparse ensemble multiple-instance multiple-label learning algorithm, S-MIMLGP, based on Bayesian learning, and compare it with current multiple-instance single label and multiple-instance multiple-label algorithms. The evaluation on a clinical dataset from the dermatology of a large local hospital shows that the proposed method can yield medically acceptable annotation accuracy, hence indicates its effectiveness.
A Graphical Modeling Language for Model Transformations
He Xiao, Ma Zhiyi, Shao Weizhong, Hu Changjun
2015, 52(9):  2145-2162.  doi:10.7544/issn1000-1239.2015.20148187
Asbtract ( 738 )   HTML ( 0)   PDF (7262KB) ( 554 )  
Related Articles | Metrics
Model transformations, the core operations within model-driven development, are usually realized as special kinds of programs. They can be used to achieve diverse conversions among models, codes, and even structural documents. With the rapid progress in model-driven methodology, model transformations are being applied to more and more complicated problems in industrial projects. As a result, they become large in scale and have complex structures. To handle the development complexity of large transformation, a graphical modeling language that can serve as a user-friendly notation for analyzing and designing transformations is required. This paper proposes VisTML (the visual transformation modeling language), a visual modeling language for model transformation programs. VisTML is comprised by seven diagrams, including goal diagram, transformation declaration diagram, model type diagram, rule diagram, composite transformation diagram, testing diagram, and configuration diagram, each of which includes a set of concepts derived from concrete transformation technologies. VisTML is able to cover all the major phases when a transformation is being developed. It supports developers to describe a transformation from various viewpoints at different abstraction level. Modeling a transformation using VisTML can help developers control the complexity and facilitate their communication. Then, the tool support of VisTML named TModeler is also presented. At last, three case studies are presented to demonstrate the feasibility and the effectiveness of VisTML.