mysql - How to index documents with their metadata in a DB using Solr 5.1.0 -
i'm using apache solr index documents search engine. these documents stored locally on file system. in order faceted search have include these documents meta-data stored in mysql db.
is there way simultaneously index these documents in file system while attaching/indexing corresponding metadata db faceted search?
if not alternative? in advance
i'm not saying drew's answer incorrect i've found there more direct way solve problem.
after couple of days of searching , posting on lucene forums able come pretty comprehensive answer question. if want index database , file system , have them submit 1 comprehensive document file , metadata there 2 ways go it. 1 better other.
the first way configure dataimporthandler, or dih. involves changing solrconfig.xml enable use of dih , need create new .xml file in conf directory of core using. enables to
1) tap multiple datasources 2) use data database find file in filesystem. ie in case filepath.
this link configure multiple datasources , understand cabilities of dih
data import handler documentation
this link set dih , connect database. there 2 parts recommend looking @ both.
configuring data import handler , connecting database
this final dih config file reference
<dataconfig> <datasource name="ds-db" type="jdbcdatasource" driver="com.mysql.jdbc.driver" url="jdbc:mysql://localhost:3306/edms_metadata" user="root" password="*************" /> <datasource name="ds-file" type="binfiledatasource"/> <document name="doc1"> <entity name="db-data" datasource="ds-db" onerror="skip" query="select textcontenturl 'id', title, authorcreator masterindex" > <field column="textcontenturl" name="id" /> <field column="title" name="title" /> <field column="authorcreator" name="author" /> <entity name="file" datasource="ds-file" onerror="skip" processor="tikaentityprocessor" url="${db-data.id}" format="text"> <field column="text" name="text" /> </entity> </entity> </document> </dataconfig>
be warned large pdf's makes solr slow , may kill it. because processing documents in solr , kills tika . why not use method leads me next method recommend indexing rich documents.
you have create own indexer. used called solrj. it's java api gives access solr. going detail take long here link skeleton of solrj used index file system , database separately. able combine them create single solr document set of metadata in database , files in filesystem. prefer because processes , gives me more control on fields.
here's link skeleton tutorial. luck. hope helps.
Comments
Post a Comment