Use Java Regex to parse xml file -


for reason cannot use sax , dom parsers , need parse regex.

i want extract values in key-value pairs(key being content in tag1, value being content in tag 3) . of keys don't have key values in between, have ignore keys.

xml file

<main tag><element><tag1>key1</tag1><tag2>not intrested</tag2><tag3>value1</tag3></element><element><tag1>key2</tag1><tag2>not intrested</tag2></element><element><tag1>key3</tag1><tag2>not intrested</tag2><tag3>value3</tag3></element></main tag> 

the above xml file indentation:

<main tag>     <element>         <tag1>key1</tag1>         <tag2>not intrested</tag2>         <tag3>value1</tag3>     </element>     <element>         <tag1>key2</tag1>         <tag2>not intrested</tag2>     </element>     <element>         <tag1>key3</tag1>         <tag2>not intrested</tag2>         <tag3>value3</tag3>     </element> </main tag>  

so above file need extract key1-value1 , key3-value3, ignoring key2 because doesn't have value.

using matcher:

final pattern pattern = pattern.compile("<tag1>(.+?)</tag1>.*<tag3>(.+?)</tag3>"); final matcher matcher = pattern.matcher(above string); matcher.find(); system.out.println(matcher.group(1)); // gives key1  system.out.println(matcher.group(1)); // gives value3 // instead of value1   

give pattern try:

"<(tag[13])>(.+?)</tag[13]>" 

usage:

public static void main(string[] args) throws exception {     string xmlstring = "<maintag><element><tag1>key1</tag1><tag2>not intrested</tag2><tag3>value1</tag3></element><element><tag1>key2</tag1><tag2>not intrested</tag2></element><element><tag1>key3</tag1><tag2>not intrested</tag2><tag3>value3</tag3></element></maintag>";      matcher matcher = pattern.compile("<(tag[13])>(.+?)</tag[13]>").matcher(xmlstring);     while (matcher.find()) {         system.out.println(matcher.group(1) + " " + matcher.group(2));     } } 

results:

tag1 key1 tag3 value1 tag1 key2 tag1 key3 tag3 value3 

non regex

or use document & documentbuilderfactory org.wc3.dom package.

something like:

public static void main(string[] args) throws exception {     string xmlstring = "<maintag><element><tag1>key1</tag1><tag2>not intrested</tag2><tag3>value1</tag3></element><element><tag1>key2</tag1><tag2>not intrested</tag2></element><element><tag1>key3</tag1><tag2>not intrested</tag2><tag3>value3</tag3></element></maintag>";     document xmldocument = documentbuilderfactory.newinstance().newdocumentbuilder().parse(new inputsource(new bytearrayinputstream(xmlstring.getbytes("utf-8"))));      node rootnode = xmldocument.getfirstchild();     if (rootnode.haschildnodes()) {         // each element child node         nodelist elementslist = rootnode.getchildnodes();         (int = 0; < elementslist.getlength(); i++) {             if (elementslist.item(i).haschildnodes()) {                 // each tag child node element node                 nodelist tagslist = elementslist.item(i).getchildnodes();                 (int i2 = 0; i2 < tagslist.getlength(); i2++) {                     node tagnode = tagslist.item(i2);                     if (tagnode.getnodename().matches("tag1|tag3")) {                         system.out.println(tagnode.getnodename() + " " + tagnode.gettextcontent());                     }                 }             }         }     } } 

results:

tag1 key1 tag3 value1 tag1 key2 tag1 key3 tag3 value3 

Comments

Popular posts from this blog

javascript - gulp-nodemon - nodejs restart after file change - Error: listen EADDRINUSE events.js:85 -

Fatal Python error: Py_Initialize: unable to load the file system codec. ImportError: No module named 'encodings' -

javascript - oscilloscope of speaker input stops rendering after a few seconds -