| |
|
|
Sven Bader | for a Full text search have I these function written, with adjustments functions it too with XLSX, PPTX, ODT, ODP, ODS and PAGES. one must letztendlich The File entzippen and the korrekte XML therein identify. with XLSX is it something verzweigter.
Profan compatibility: The Unzip functions ex X4, before it must one itself a DLL for search Utf8_Decode goes ex X3, before it can itself something with Translate$() zusammenbauen, the at least frequent characters How Umlaute supplant.
Proc ReadDocx
Parameters inputFile$
Declare content$, filesize&, tempFile$, B#
tempFile$ = $TempDir + "docxopener" + "\\word\\document.xml"
'Entpacken
ifnot (FileExists(inputFile$))
Print inputFile$;" not found!"
Return
endif
UnZip inputFile$, ($TempDir + "docxopener") ,"word\document.xml"
filesize& = FileSize(tempFile$)
if (filesize& < 0)
Print "Error at Entpacken!"
Return
endif
'reading
Dim B#, filesize& + 1
Assign #1, tempFile$
OpenRW #1
BlockRead(#1, B#, 0, filesize&)
Erase #1
Close #1
content$ = String $(B#,0)
'Beautiful make
content$ = Utf8Decode(content$)
content$ = Translate$(content$,"<w:p","\n<w:p")'Paragraph Start DOCX
content$ = Translate$(content$,":p>",":p>\n\n")'Paragraph end
content$ = Translate$(content$,":tab/>",":tab/> ")'tab
content$ = Translate$(content$,":br/>",":br/>\n")'Break
content$ = Translate$(content$,":line-break/>",":line-break/>\n")'Break
content$ = Translate$(content$," "," ")
Set("RegEx", 1)
content$ = Translate$(content$,"<[^>]*>","")'Strip Tags
Set("RegEx", 0)
content$ = Trim $(content$)
Return content$
ENDPROC
Cls
messagebox ReadDocx("test.docx") ,"",0
Waitinput
End
|
|
|
| |
|
|