Elia Pinto | 29 Nov 2011 10:58
Picon

[PATCH] Exclude filename from .gz files

From: Elia Pinto <yersinia.spiros <at> gmail.com>

When you run createrepo, the original filenames,
including full path, of the files are stored
in the header of the .gz metadata

The path becomes part of the checksum of the gzip file
itself.  So, if you gunzip the file and re-gzip it,
you get a different checksum.

Based on the original observation of Dennis Gregorovic
---
 createrepo/utils.py |   20 ++++++--------------
 1 files changed, 6 insertions(+), 14 deletions(-)

diff --git a/createrepo/utils.py b/createrepo/utils.py
index c5cec64..d52d070 100644
--- a/createrepo/utils.py
+++ b/createrepo/utils.py
 <at>  <at>  -40,22 +40,14  <at>  <at>  def _(args):

 class GzipFile(gzip.GzipFile):
     def _write_gzip_header(self):
+        # Generate a header that is easily reproduced with gzip -9 -n on
+        # an unix-like system
         self.fileobj.write('\037\213')             # magic header
         self.fileobj.write('\010')                 # compression method
-        if hasattr(self, 'name'):
-            fname = self.name[:-3]
-        else:
(Continue reading)

Dennis Gregorovic | 30 Nov 2011 12:45
Picon
Favicon

Re: [PATCH] Exclude filename from .gz files

A quick test on my local machine confirms that this patch does the
trick.

# createrepo .
# cd repodata/
# sha1sum other.xml.gz 
1eb13a25318339d9e8157f0bf80419c019fa5000  other.xml.gz
# gunzip other.xml.gz 
# gzip -9 -n other.xml 
# sha1sum other.xml.gz 
1eb13a25318339d9e8157f0bf80419c019fa5000  other.xml.gz

-- Dennis

On Tue, 2011-11-29 at 04:58 -0500, Elia Pinto wrote:
> From: Elia Pinto <yersinia.spiros <at> gmail.com>
> 
> When you run createrepo, the original filenames,
> including full path, of the files are stored
> in the header of the .gz metadata
> 
> The path becomes part of the checksum of the gzip file
> itself.  So, if you gunzip the file and re-gzip it,
> you get a different checksum.
> 
> Based on the original observation of Dennis Gregorovic
> ---
>  createrepo/utils.py |   20 ++++++--------------
>  1 files changed, 6 insertions(+), 14 deletions(-)
> 
(Continue reading)

James Antill | 30 Nov 2011 17:39
Favicon

Re: [PATCH] Exclude filename from .gz files

On Tue, 2011-11-29 at 04:58 -0500, Elia Pinto wrote:
> From: Elia Pinto <yersinia.spiros <at> gmail.com>
> 
> When you run createrepo, the original filenames,
> including full path, of the files are stored
> in the header of the .gz metadata
> 
> The path becomes part of the checksum of the gzip file
> itself.  So, if you gunzip the file and re-gzip it,
> you get a different checksum.
> 
> Based on the original observation of Dennis Gregorovic

 ACK, and pushed.

> ---
>  createrepo/utils.py |   20 ++++++--------------
>  1 files changed, 6 insertions(+), 14 deletions(-)
> 
> diff --git a/createrepo/utils.py b/createrepo/utils.py
> index c5cec64..d52d070 100644
> --- a/createrepo/utils.py
> +++ b/createrepo/utils.py
>  <at>  <at>  -40,22 +40,14  <at>  <at>  def _(args):
>  
>  class GzipFile(gzip.GzipFile):
>      def _write_gzip_header(self):
> +        # Generate a header that is easily reproduced with gzip -9 -n on
> +        # an unix-like system
>          self.fileobj.write('\037\213')             # magic header
(Continue reading)


Gmane