Brian Sanders | 8 Oct 04:25

New to LUA, trying to read from a file

Hello,  I am new to LUA and was trying to do some basic scripts to get a feel for it.  I have tried to make a basic script to parse through a log file and display simple output.  I have the following code which I thought I understood just fine.

print ("opening file for reading")
logfile = io.open("system.log","r")
logstring = logfile:read("*all")
print(logstring)

What is confusing me is, I can create a new text document, and just type a few words in it.  Then I can use this script (unless in my many variations I posted the wrong one) and get the text back out.  If I could get past this, I would then do some small matching off the log file.  Unfortunately it only works when I make my text document.  When I grab the actual log file I am trying to parse, it only returns a space, a square, and then the first character in the file.  When opening this log file in word pad it does look normal with line returns after each log entry.  In notepad however the line returns are shown as squares.  I am therefore led to believe this must have something to do with formatting of this file, but I really don't know.  Can anyone point me in the right direction here?  I just don't see how these line returns could be the problem when it does not even parse that far, it just gets to the first character.

Thanks for helping a new guy out,
Brian



--
"Faithless is he, who says 'farewell', when the path darkens
"you just keep on trying till you run out of cake"
Mike Crowe | 8 Oct 04:45

Re: New to LUA, trying to read from a file

Hi Brian,

Couple of things:

1)  How big is your log file?  Are you trying to read too much?  You may want to read line-by-line with "*line" instead of "*all", though it is more inefficient.

2)  Here's a recent snippet I used:

local file = io.open(feed,"r")   
while true do

local line = file:read("*l")
if line == nil then break end
text = line .. "\n" .. text
end


On 10/7/08, Brian Sanders <brian.sanders <at> gmail.com> wrote:
Hello,  I am new to LUA and was trying to do some basic scripts to get a feel for it.  I have tried to make a basic script to parse through a log file and display simple output.  I have the following code which I thought I understood just fine.

print ("opening file for reading")
logfile = io.open("system.log","r")
logstring = logfile:read("*all")
print(logstring)

What is confusing me is, I can create a new text document, and just type a few words in it.  Then I can use this script (unless in my many variations I posted the wrong one) and get the text back out.  If I could get past this, I would then do some small matching off the log file.  Unfortunately it only works when I make my text document.  When I grab the actual log file I am trying to parse, it only returns a space, a square, and then the first character in the file.  When opening this log file in word pad it does look normal with line returns after each log entry.  In notepad however the line returns are shown as squares.  I am therefore led to believe this must have something to do with formatting of this file, but I really don't know.  Can anyone point me in the right direction here?  I just don't see how these line returns could be the problem when it does not even parse that far, it just gets to the first character.

Thanks for helping a new guy out,
Brian



--
"Faithless is he, who says 'farewell', when the path darkens
"you just keep on trying till you run out of cake"

James Dennett | 8 Oct 04:46

Re: New to LUA, trying to read from a file

On Tue, Oct 7, 2008 at 7:26 PM, Brian Sanders <brian.sanders <at> gmail.com> wrote:
Hello,  I am new to LUA and was trying to do some basic scripts to get a feel for it.  I have tried to make a basic script to parse through a log file and display simple output.  I have the following code which I thought I understood just fine.

print ("opening file for reading")
logfile = io.open("system.log","r")
logstring = logfile:read("*all")
print(logstring)

What is confusing me is, I can create a new text document, and just type a few words in it.  Then I can use this script (unless in my many variations I posted the wrong one) and get the text back out.  If I could get past this, I would then do some small matching off the log file.  Unfortunately it only works when I make my text document.  When I grab the actual log file I am trying to parse, it only returns a space, a square, and then the first character in the file.  When opening this log file in word pad it does look normal with line returns after each log entry.  In notepad however the line returns are shown as squares.  I am therefore led to believe this must have something to do with formatting of this file, but I really don't know.  Can anyone point me in the right direction here?  I just don't see how these line returns could be the problem when it does not even parse that far, it just gets to the first character.

Thanks for helping a new guy out,
Brian

What's the format of the log file, exactly?  In particular, what charset is it using (range and encoding), and what newline format?  A hex dump of (the start of) the file can be very helpful in guessing that, if it's not documented somewhere.  You mention "notepad", so I might guess that you're on Windows, which increases the probability that the file is some variant of UTF16, possibly UTF-16LE with a BOM (byte order mark).  But that's a wild guess, and not likely to be right.

Unfortunately the simple term "text file" covers a whole family of formats.

-- James

Brian Sanders | 8 Oct 12:24

Re: New to LUA, trying to read from a file

wow that was fast...  let me see if I got all this correct.

First, as far as the format of the log file, I may have to look at it in a hex editor.  I don't know the format they used while writing the file, I just know it is written as .log and it seems generally expected that word pad can view it just fine.  So perhaps more on that to come, sounds like a place to start.

shouldn't it be "*a"? Check
http://www.lua.org/manual/5.1/manual.html#pdf-file:read ...
Also, you can try to print the length of the string with
print(#logstring).
Using *a produced the same results, I remember trying multiple ways due to my google searching.  If one is correct and the other is not, that is one thing I was hoping to learn from this little experiment :)

I put the print statement in to see the length of the string.  It returns 2345630, even though it only prints those first few characters.  I found that interesting.  Am i getting to large a size for a single string?  I could process each line individually...

Couple of things:

1)  How big is your log file?  Are you trying to read too much?  You may want to read line-by-line with "*line" instead of "*all", though it is more inefficient.

2)  Here's a recent snippet I used:

local file = io.open(feed,"r")   
while true do

local line = file:read("*l")
if line == nil then break end
text = line .. "\n" .. text
end

Well the log file is 2,291KB but I tried a very short one, of only about 4KB with the exact same results.  I tried implementing this line by line code to see how it would turn out.  If I print text outside the end statement, I get the exact same output as when I read everything at once.  I then print the length of the string as suggested earlier and I end up with 4.  I even tried adding a print of line inside the while loop just to see if I could simply print each line as it reads it.  It prints the exact same text one time.  It appears that reading a line at a time is stopping after the first attempt, which still has output I don't understand in it.

So I am probably back to, how is this file formatted exactly.  I believe the logs were written with the idea of opening them in word pad for reading, but I will see about getting a hex editor and comparing this to a standard text file, which I have seen work.





On Tue, Oct 7, 2008 at 10:46 PM, James Dennett <james.dennett <at> gmail.com> wrote:
On Tue, Oct 7, 2008 at 7:26 PM, Brian Sanders <brian.sanders <at> gmail.com> wrote:
Hello,  I am new to LUA and was trying to do some basic scripts to get a feel for it.  I have tried to make a basic script to parse through a log file and display simple output.  I have the following code which I thought I understood just fine.

print ("opening file for reading")
logfile = io.open("system.log","r")
logstring = logfile:read("*all")
print(logstring)

What is confusing me is, I can create a new text document, and just type a few words in it.  Then I can use this script (unless in my many variations I posted the wrong one) and get the text back out.  If I could get past this, I would then do some small matching off the log file.  Unfortunately it only works when I make my text document.  When I grab the actual log file I am trying to parse, it only returns a space, a square, and then the first character in the file.  When opening this log file in word pad it does look normal with line returns after each log entry.  In notepad however the line returns are shown as squares.  I am therefore led to believe this must have something to do with formatting of this file, but I really don't know.  Can anyone point me in the right direction here?  I just don't see how these line returns could be the problem when it does not even parse that far, it just gets to the first character.

Thanks for helping a new guy out,
Brian

What's the format of the log file, exactly?  In particular, what charset is it using (range and encoding), and what newline format?  A hex dump of (the start of) the file can be very helpful in guessing that, if it's not documented somewhere.  You mention "notepad", so I might guess that you're on Windows, which increases the probability that the file is some variant of UTF16, possibly UTF-16LE with a BOM (byte order mark).  But that's a wild guess, and not likely to be right.

Unfortunately the simple term "text file" covers a whole family of formats.

-- James




--
"Faithless is he, who says 'farewell', when the path darkens
"you just keep on trying till you run out of cake"
Tim Channon | 8 Oct 12:29

Re: New to LUA, trying to read from a file

Brian Sanders wrote:

> logfile = io.open("system.log","r")

Maybe "rb" would help

Brian Sanders | 8 Oct 12:37

Re: New to LUA, trying to read from a file

Yeah, I considered the binary as well...  but it didn't make a difference.  Got the exact same output.

Looking at this fine in a hex editor, I see some interesting stuff.  The file starts with FF EE, then begins with the standard characters.  Between every character is 00, which in the other window translates to just a square.  It appears that these 00's are every other character in the file.  I can't control how these files are written, but does knowing this tell someone what might be going on? 

for example

FF EE 58 00 30 00
??  ??  [   ??  0  ??

So every other character from then on is expected and is what I see in the file.  I just don't know the starting, or the 00 every other character. 




On Wed, Oct 8, 2008 at 6:29 AM, Tim Channon <tc <at> gpsl.net> wrote:
Brian Sanders wrote:

> logfile = io.open("system.log","r")

Maybe "rb" would help



--
"Faithless is he, who says 'farewell', when the path darkens
"you just keep on trying till you run out of cake"
Gravatar

Re: New to LUA, trying to read from a file

On Wednesday 08 October 2008, Brian Sanders wrote:
> Looking at this fine in a hex editor, I see some interesting stuff.  The
> file starts with FF EE, then begins with the standard characters.  Between
> every character is 00, which in the other window translates to just a
> square.  It appears that these 00's are every other character in the file.
> I can't control how these files are written, but does knowing this tell
> someone what might be going on?

yep, that's UCS-16

I wouldn't consider that a 'text file'

--

-- 
Javier
Klaus Ripke | 8 Oct 12:53
Favicon

Re: New to LUA, trying to read from a file

On Wed, Oct 08, 2008 at 06:37:24AM -0400, Brian Sanders wrote:
> Looking at this fine in a hex editor, I see some interesting stuff.  The
> file starts with FF EE, then begins with the standard characters.  Between
Hmm, this should be FF EE, making it UTF-16 or UCS-2 with BOM.
http://en.wikipedia.org/wiki/UTF-16/UCS-2

FF FE is the byte order mark, telling you it's little endian
(lower byte first, the Unicode value of the BOM is U+FEFF),
which comes as no surprise on a Wintel box..

> every character is 00, which in the other window translates to just a
> square.  It appears that these 00's are every other character in the file.
> FF EE 58 00 30 00
> ??  ??  [   ??  0  ??
> 
> So every other character from then on is expected and is what I see in the
> file.  I just don't know the starting, or the 00 every other character.

As long as the actual character values are in Latin 1,
the high byte (every other) is always 0 and you can simply ignore it,
and discard the two bytes BOM (sure you have FF EE?).

HTH
Klaus

Klaus Ripke | 8 Oct 12:59
Favicon

Re: New to LUA, trying to read from a file

On Wed, Oct 08, 2008 at 12:53:54PM +0200, Klaus Ripke wrote:
> On Wed, Oct 08, 2008 at 06:37:24AM -0400, Brian Sanders wrote:
> > file starts with FF EE, then begins with the standard characters.  Between
> Hmm, this should be FF EE, making it UTF-16 or UCS-2 with BOM.
... should be FF FE, making it UTF-16 or UCS-2 with BOM. sry

Brian Sanders | 8 Oct 13:30

Re: New to LUA, trying to read from a file

Hmm... so if it is in either UTF-16 or UCS-2 with BOM... is there any way for me to use these log files with a LUA script?  I guess it is good to know that I did understand the LUA tutorials, it was my input file I was not looking closely enough at.

On Wed, Oct 8, 2008 at 6:59 AM, Klaus Ripke <paul-lua <at> malete.org> wrote:
On Wed, Oct 08, 2008 at 12:53:54PM +0200, Klaus Ripke wrote:
> On Wed, Oct 08, 2008 at 06:37:24AM -0400, Brian Sanders wrote:
> > file starts with FF EE, then begins with the standard characters.  Between
> Hmm, this should be FF EE, making it UTF-16 or UCS-2 with BOM.
... should be FF FE, making it UTF-16 or UCS-2 with BOM. sry



--
"Faithless is he, who says 'farewell', when the path darkens
"you just keep on trying till you run out of cake"
Matthew Wild | 8 Oct 13:37

Re: New to LUA, trying to read from a file

On Wed, Oct 8, 2008 at 12:30 PM, Brian Sanders <brian.sanders <at> gmail.com> wrote:
> Hmm... so if it is in either UTF-16 or UCS-2 with BOM... is there any way
> for me to use these log files with a LUA script?  I guess it is good to know
> that I did understand the LUA tutorials, it was my input file I was not
> looking closely enough at.
>

It's a hack, and there is probably a nice(r) way of doing it, but try:

logstring = logstring:sub(3):gsub("%z", "")

It will at least remove the zeros that stop it from printing, but if
you have non-latin characters then they might get messed up.

Matthew.

李辉 | 8 Oct 14:07

Re: New to LUA, trying to read from a file

in liolib.c line 291 (version 2.1.3)

read_line function

it use "fgets(p, LUAL_BUFFERSIZE, f)" to read the file  (line 297)

but use "l = strlen(p);" to get the length (line 301)

so,for u file,although u read lots of bytes from file(eg n),the strlen will return 3 for your data "FF EE 58 00 30 00...."

and the length of this read will be set 3,so u will get 3 in lua script

but for ther file,the pos goes n bytes,so n-3 bytes lost,and at last u will get only few characters.

maybe it`s a bug of lua?

2008/10/8 Matthew Wild <mwild1 <at> gmail.com>
On Wed, Oct 8, 2008 at 12:30 PM, Brian Sanders <brian.sanders <at> gmail.com> wrote:
> Hmm... so if it is in either UTF-16 or UCS-2 with BOM... is there any way
> for me to use these log files with a LUA script?  I guess it is good to know
> that I did understand the LUA tutorials, it was my input file I was not
> looking closely enough at.
>

It's a hack, and there is probably a nice(r) way of doing it, but try:

logstring = logstring:sub(3):gsub("%z", "")

It will at least remove the zeros that stop it from printing, but if
you have non-latin characters then they might get messed up.

Matthew.



--
同洲 李辉
Tel:0755-26990000-7741
手机:13631656753
Favicon

Re: New to LUA, trying to read from a file

> maybe it`s a bug of lua?

No. The notion of lines only make sense for text files. If you read
binary files (defined as those that contain unprintable bytes -- not
chars), you'll probably get weird results.

Brian Sanders | 8 Oct 14:15

Re: New to LUA, trying to read from a file

Thanks, I see what you did and I will give that a try!  I may not get back to this little project till tomorrow but I will try and give you a quick up date to how it goes.

Thanks again!

On Wed, Oct 8, 2008 at 7:37 AM, Matthew Wild <mwild1 <at> gmail.com> wrote:
On Wed, Oct 8, 2008 at 12:30 PM, Brian Sanders <brian.sanders <at> gmail.com> wrote:
> Hmm... so if it is in either UTF-16 or UCS-2 with BOM... is there any way
> for me to use these log files with a LUA script?  I guess it is good to know
> that I did understand the LUA tutorials, it was my input file I was not
> looking closely enough at.
>

It's a hack, and there is probably a nice(r) way of doing it, but try:

logstring = logstring:sub(3):gsub("%z", "")

It will at least remove the zeros that stop it from printing, but if
you have non-latin characters then they might get messed up.

Matthew.



--
"Faithless is he, who says 'farewell', when the path darkens
"you just keep on trying till you run out of cake"
Klaus Ripke | 8 Oct 14:55
Favicon

Re: New to LUA, trying to read from a file

On Wed, Oct 08, 2008 at 12:37:39PM +0100, Matthew Wild wrote:
> It's a hack, and there is probably a nice(r) way of doing it, but try:
> 
> logstring = logstring:sub(3):gsub("%z", "")
> 
> It will at least remove the zeros that stop it from printing, but if
> you have non-latin characters then they might get messed up.

a bit cleaner (and more expensive) would be something like

s = s:gsub('(.)(.)', function (lo,hi) return 0==hi and lo or '?' end)

this transforms all characters in the Latin-1 subset to their
Latin-1 code and all others, including the nasty BOM, into a '?'

conversion of UCS-2 to UTF-8 can also easily be done in Lua
(although using iconv is probably considerably faster, if you have it):

local format, mod, floor = string.format, math.mod, math.floor
function utf8 (i) -- BMP only
  if i<128 then return format("%c", i) end
  if i<2048 then return format("%c%c", 192+i/64, 128+mod(i,64)) end
  local j=floor(i/4096)
  i = i-j*4096
  return format("%c%c%c", 224+j, 128+i/64, 128+mod(i,64))
end
s = s:gsub('(.)(.)', function (lo,hi) return utf8(hi*256+lo) end)

make sure to read your file in binary chunks of even size.

cheers
Klaus

David Given | 8 Oct 13:58

Re: New to LUA, trying to read from a file

Brian Sanders wrote:
> Hmm... so if it is in either UTF-16 or UCS-2 with BOM... is there any
> way for me to use these log files with a LUA script?  I guess it is good
> to know that I did understand the LUA tutorials, it was my input file I
> was not looking closely enough at.

You need to convert it somehow into UTF-8. There are a number of Lua
addons for doing this sort of transcoding, but if you've got Cygwin,
it's probably easier just to use the following command line:

iconv -f utf-16 -t utf-8 fnord.log > fnord.txt

...then open 'fnord.txt'.

You may even be able to do this:

local fp = io.popen("iconv -f utf-16 -t utf-8 fnord.log")
local text = fp:read("*all")

...but I forget whether that works on Windows.

(This whole problem is due to Windows having a rather different idea of
what 'plain text' means to the rest of the world; see
http://en.wikipedia.org/wiki/Bush_hid_the_facts for a rather amusing
consequence...)

--

-- 
David Given
dg <at> cowlark.com


Gmane