0 citations0 references

Exploring PSI-MI XML Collections Using DescribeX

Berichte aus der medizinischen Informatik und Bioinformatik/Journal of integrative bioinformatics2007Vol. 4(3), pp. 123–134

Reza Samavi, Mariano P. Consens, Shahan Khatchadourian, Thodoros Topaloglou

Abstract

Summary PSI-MI has been endorsed by the protein informatics community as a standard XML data exchange format for protein-protein interaction datasets. While many public databases support the standard, there is a degree of heterogeneity in the way the proposed XML schema is interpreted and instantiated by different data providers. Analysis of schema instantiation in large collections of XML data is a challenging task that is unsupported by existing tools. In this study we use DescribeX, a novel visualization technique of (semi-)structured XML formats, to quantitatively and qualitatively analyze PSI-MI XML collections at the instance level with the goal of gaining insights about schema usage and to study specific questions such as: adequacy of controlled vocabularies, detection of common instance patterns, and evolution of different data collections. Our analysis shows DescribeX enhances understanding the instance-level structure of PSI-MI data sources and is a useful tool for standards designers, software developers, and PSI-MI data providers.

Related Papers

A Framework of Summarizing XML Documents with Schemas(2013)
→ XML Schema Validation Using Java API for XML Processing(2022)1 cited
→ A Sufficient and Necessary Condition for the Consistency of XML DTDs(2003)4 cited
XML Documents Confirmation System Design Based on Schema Documents(2010)
→ Applications of Automata in XML Processing(2009)