Skip to content

bitrevo/netscape-bookmark-parser

 
 

Repository files navigation

netscape-bookmark-parser

license

This repo is forked from kafene/netscape-bookmark-parser and added nested output support and OO data object.

About

This library provides a generic NetscapeBookmarkParser class that is able of parsing Netscape bookmark export files.

The motivations behind developing this parser are the following:

  • the Netscape format has a very loose specification: no DTD nor XSL stylesheet to constrain how data is formatted
  • software and web services export bookmarks using a wild variety of attribute names and values
  • using standard SAX or DOM parsers is thus not straightforward.

How it works:

  • the input bookmark file is trimmed and sanitized to improve parsing results
  • the resulting data is then parsed using PCRE patterns to match attributes and values corresponding to the most likely:
    • attribute names: description vs. note, tags vs. labels, date vs. time, etc.
    • data formats: comma,separated,tags vs. space separated labels, UNIX epochs vs. human-readable dates, newlines & carriage returns, etc.
  • an associative array containing all successfully parsed links with their attributes is returned

Example

Script:

<?php
require_once 'NetscapeBookmarkParser.php';

$parser = new NetscapeBookmarkParser();
$bookmarks = $parser->parseFile('./tests/input/netscape_basic.htm');
var_dump($bookmarks);

Output:

object(Folder)#273 (3) {
  ["title"]=>
  string(4) "Root"
  ["parent"]=>
  NULL
  ["content"]=>
  array(1) {
    [0]=>
    object(Folder)#279 (3) {
      ["title"]=>
      string(13) "Bookmarks bar"
      ["parent"]=>
      *RECURSION*
      ["content"]=>
      array(8) {
        [0]=>
        object(Page)#277 (6) {
          ["uri"]=>
          string(25) "https://private.tld"
          ["title"]=>
          string(9) "Secret stuff"
          ["note"]=>
          string(0) "Super-secret stuff you're not supposed to know about"
          ["tags"]=>
          string(13) "bookmarks bar"
          ["time"]=>
          int(1563348673)
          ["pub"]=>
          string(1) "0"
        }
        [0]=>
        object(Page)#277 (6) {
          ["uri"]=>
          string(25) "http://public.tld"
          ["title"]=>
          string(9) "Public stuff"
          ["note"]=>
          string(0) ""
          ["tags"]=>
          string(13) "bookmarks bar"
          ["time"]=>
          int(1563348673)
          ["pub"]=>
          string(1) "0"
        }
      }
    }
  }
}

About

a php script (function) to parse netscape format bookmark files

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • PHP 57.3%
  • HTML 42.7%