Rupert Swarbrick | 13 Jan 2011 00:34
Picon
Gravatar

[PATCH 0/5] Info parsing improvements

The first two patches fix bug 585253 (and simplify info parsing
significantly).

The last one massively speeds up loading large info files (by not
doing it!).

Rupert Swarbrick (5):
  Fix get_value_after in yelp-info-parser.c
  Rewrite the way we read in the info file & calculate nodes' offsets.
  Use glib's g_uri_unescape_string instead of our own decode_url.
  Fixed a couple of compiler warnings and a return type.
  Parse info files one page at a time.

 libyelp/yelp-document.c      |   16 +-
 libyelp/yelp-document.h      |    2 +-
 libyelp/yelp-info-document.c |  302 +++++++++------
 libyelp/yelp-info-parser.c   |  912 +++++++++++++++---------------------------
 libyelp/yelp-info-parser.h   |   18 +-
 libyelp/yelp-uri.c           |   62 +---
 stylesheets/info2html.xsl.in |   69 ++--
 7 files changed, 557 insertions(+), 824 deletions(-)

--

-- 
1.7.2.3
Rupert Swarbrick | 5 Jan 2011 16:25
Picon
Gravatar

[PATCH 1/5] Fix get_value_after in yelp-info-parser.c

As it was, this was returning silly things for at least the last page
of each info file. Also, it had unneccesary string copying.
---
 libyelp/yelp-info-parser.c |   69 +++++++++++++++++++++++++++----------------
 1 files changed, 43 insertions(+), 26 deletions(-)

diff --git a/libyelp/yelp-info-parser.c b/libyelp/yelp-info-parser.c
index edd3812..7d174a8 100644
--- a/libyelp/yelp-info-parser.c
+++ b/libyelp/yelp-info-parser.c
 <at>  <at>  -547,43 +547,60  <at>  <at>  static GHashTable
 	return table;
 }

-static char
-*get_value_after (char *source, char *required)
+/*
+  Look for strings in source by key. For example, we extract "blah"
+  from "Node: blah," when the key is "Node: ". To know when to stop,
+  there are two strings: end and cancel.
+
+  If we find a character from end first, return a copy of the string
+  up to (not including) that character. If we find a character of
+  cancel first, return NULL. If we find neither, return the rest of
+  the string.
+
+  cancel can be NULL, in which case, we don't do its test.
+ */
+static char*
+get_value_after_ext (const char *source, const char *key,
(Continue reading)

Rupert Swarbrick | 6 Jan 2011 13:50
Picon
Gravatar

[PATCH 2/5] Rewrite the way we read in the info file & calculate nodes' offsets.

This fixes a bug with how node2offset, offsets2pages etc. were
calculated when reading in a file with an indirect map. It also makes
the logic much simpler and (I think) the code is no less efficient.
---
 libyelp/yelp-info-parser.c |  298 ++++++++++++++++++++------------------------
 1 files changed, 134 insertions(+), 164 deletions(-)

diff --git a/libyelp/yelp-info-parser.c b/libyelp/yelp-info-parser.c
index 7d174a8..5ecdc5a 100644
--- a/libyelp/yelp-info-parser.c
+++ b/libyelp/yelp-info-parser.c
 <at>  <at>  -33,8 +33,6  <at>  <at> 
 #include "yelp-debug.h"

 
-typedef struct _TagTableFix TagTableFix;
-
 GtkTreeIter *         find_real_top                      (GtkTreeModel *model, 
 							  GtkTreeIter *it);
 GtkTreeIter *         find_real_sibling                  (GtkTreeModel *model,
 <at>  <at>  -53,9 +51,6  <at>  <at>  gboolean              resolve_frag_id                    (GtkTreeModel *model,
 							  GtkTreePath *path, 
 							  GtkTreeIter *iter,
 							  gpointer data);
-void                  fix_tag_table                      (gchar *offset, 
-							  gpointer page, 
-							  TagTableFix *a);
 void   		      info_process_text_notes            (xmlNodePtr *node, 
 							  gchar *content,
 							  GtkTreeStore
(Continue reading)

Rupert Swarbrick | 10 Jan 2011 22:00
Picon
Gravatar

[PATCH 3/5] Use glib's g_uri_unescape_string instead of our own decode_url.

---
 libyelp/yelp-uri.c |   62 +---------------------------------------------------
 1 files changed, 1 insertions(+), 61 deletions(-)

diff --git a/libyelp/yelp-uri.c b/libyelp/yelp-uri.c
index aa467d6..2069622 100644
--- a/libyelp/yelp-uri.c
+++ b/libyelp/yelp-uri.c
 <at>  <at>  -941,66 +941,6  <at>  <at>  resolve_man_uri (YelpUri *uri)
     }
 }

-/*
-  Return 1 if ch is a number from 0 to 9 or a letter a-f or A-F and 0
-  otherwise. This is sort of not utf8-safe, but since we are only
-  looking for 7-bit things, it doesn't matter.
- */
-static int
-is_hex (gchar ch)
-{
-    if (((48 <= ch) && (ch <= 57)) ||
-        ((65 <= ch) && (ch <= 70)) ||
-        ((97 <= ch) && (ch <= 102)))
-        return 1;
-    return 0;
-}
-
-/*
-  Return a newly allocated string, where %ab for a,b in [0, f] is
-  replaced by the character it represents.
(Continue reading)

Rupert Swarbrick | 12 Jan 2011 02:09
Picon
Gravatar

[PATCH 4/5] Fixed a couple of compiler warnings and a return type.

---
 libyelp/yelp-document.c |   16 ++++++++++------
 libyelp/yelp-document.h |    2 +-
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/libyelp/yelp-document.c b/libyelp/yelp-document.c
index 215586e..6876496 100644
--- a/libyelp/yelp-document.c
+++ b/libyelp/yelp-document.c
 <at>  <at>  -205,6 +205,7  <at>  <at>  yelp_document_get_for_uri (YelpUri *uri)
     case YELP_URI_DOCUMENT_TYPE_NOT_FOUND:
     case YELP_URI_DOCUMENT_TYPE_EXTERNAL:
     case YELP_URI_DOCUMENT_TYPE_ERROR:
+    case YELP_URI_DOCUMENT_TYPE_UNRESOLVED:
         break;
     }

 <at>  <at>  -625,7 +626,7  <at>  <at>  yelp_document_get_page_icon (YelpDocument *document,
     return ret;
 }

-gchar *
+void
 yelp_document_set_page_icon (YelpDocument *document,
                              const gchar  *page_id,
                              const gchar  *icon)
 <at>  <at>  -646,8 +647,9  <at>  <at>  yelp_document_request_page (YelpDocument         *document,
 			    YelpDocumentCallback  callback,
 			    gpointer              user_data)
 {
(Continue reading)

Rupert Swarbrick | 12 Jan 2011 12:43
Picon
Gravatar

[PATCH 5/5] Parse info files one page at a time.

With the previous implementation, parsing a large info file takes a
significant amount of time (several seconds). But most of this time is
spent parsing the thing to xml.

This version instead quickly reads the file into memory (negligible
time required) and then only parses pages to xml when they're needed.
---
 libyelp/yelp-info-document.c |  302 +++++++++++-------
 libyelp/yelp-info-parser.c   |  721 ++++++++++++++----------------------------
 libyelp/yelp-info-parser.h   |   18 +-
 stylesheets/info2html.xsl.in |   69 ++---
 4 files changed, 456 insertions(+), 654 deletions(-)

diff --git a/libyelp/yelp-info-document.c b/libyelp/yelp-info-document.c
index acfb33f..1c16b20 100644
--- a/libyelp/yelp-info-document.c
+++ b/libyelp/yelp-info-document.c
 <at>  <at>  -40,23 +40,38  <at>  <at> 
 #define STYLESHEET DATADIR"/yelp/xslt/info2html.xsl"

 typedef enum {
-    INFO_STATE_BLANK,   /* Brand new, run transform as needed */
+    INFO_STATE_READY,   /* Brand new, run transform as needed */
     INFO_STATE_PARSING, /* Parsing/transforming document, please wait */
-    INFO_STATE_PARSED,  /* All done, if we ain't got it, it ain't here */
     INFO_STATE_STOP     /* Stop everything now, object to be disposed */
 } InfoState;

 typedef struct _YelpInfoDocumentPrivate  YelpInfoDocumentPrivate;
 struct _YelpInfoDocumentPrivate {
(Continue reading)


Gmane