questions anon | 1 Dec 2011 02:16
Picon

ignore NAN in numpy.true_divide()

I am trying to calculate the mean across many netcdf files. I cannot use numpy.mean because there are too many files to concatenate and I end up with a memory error. I have enabled the below code to do what I need but I have a few nan values in some of my arrays. Is there a way to ignore these somewhere in my code. I seem to face this problem often so I would love a command that ignores blanks in my array before I continue on to the next processing step.
Any feedback is greatly appreciated.


netCDF_list=[]
for dir in glob.glob(MainFolder + '*/01/')+ glob.glob(MainFolder + '*/02/')+ glob.glob(MainFolder + '*/12/'):
        for ncfile in glob.glob(dir + '*.nc'):
            netCDF_list.append(ncfile)

slice_counter=0
print netCDF_list

for filename in netCDF_list:
        ncfile=netCDF4.Dataset(filename)
        TSFC=ncfile.variables['T_SFC'][:]
        fillvalue=ncfile.variables['T_SFC']._FillValue
        TSFC=MA.masked_values(TSFC, fillvalue)
        for i in xrange(0,len(TSFC)-1,1):
                slice_counter +=1
                #print slice_counter
                try:
                        running_sum=N.add(running_sum, TSFC[i])
                except NameError:
                        print "Initiating the running total of my variable..."
                        running_sum=N.array(TSFC[i])

TSFC_avg=N.true_divide(running_sum, slice_counter)
N.set_printoptions(threshold='nan')
print "the TSFC_avg is:", TSFC_avg

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
questions anon | 5 Dec 2011 23:29
Picon

Re: ignore NAN in numpy.true_divide()

Maybe I am asking the wrong question or could go about this another way.
I have thousands of numpy arrays to flick through, could I just identify which arrays have NAN's and for now ignore the entire array. is there a simple way to do this?
any feedback will be greatly appreciated.

On Thu, Dec 1, 2011 at 12:16 PM, questions anon <questions.anon <at> gmail.com> wrote:
I am trying to calculate the mean across many netcdf files. I cannot use numpy.mean because there are too many files to concatenate and I end up with a memory error. I have enabled the below code to do what I need but I have a few nan values in some of my arrays. Is there a way to ignore these somewhere in my code. I seem to face this problem often so I would love a command that ignores blanks in my array before I continue on to the next processing step.
Any feedback is greatly appreciated.


netCDF_list=[]
for dir in glob.glob(MainFolder + '*/01/')+ glob.glob(MainFolder + '*/02/')+ glob.glob(MainFolder + '*/12/'):
        for ncfile in glob.glob(dir + '*.nc'):
            netCDF_list.append(ncfile)

slice_counter=0
print netCDF_list

for filename in netCDF_list:
        ncfile=netCDF4.Dataset(filename)
        TSFC=ncfile.variables['T_SFC'][:]
        fillvalue=ncfile.variables['T_SFC']._FillValue
        TSFC=MA.masked_values(TSFC, fillvalue)
        for i in xrange(0,len(TSFC)-1,1):
                slice_counter +=1
                #print slice_counter
                try:
                        running_sum=N.add(running_sum, TSFC[i])
                except NameError:
                        print "Initiating the running total of my variable..."
                        running_sum=N.array(TSFC[i])

TSFC_avg=N.true_divide(running_sum, slice_counter)
N.set_printoptions(threshold='nan')
print "the TSFC_avg is:", TSFC_avg


_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
David Cournapeau | 5 Dec 2011 23:45
Picon
Gravatar

Re: ignore NAN in numpy.true_divide()

On Mon, Dec 5, 2011 at 5:29 PM, questions anon <questions.anon <at> gmail.com> wrote:
> Maybe I am asking the wrong question or could go about this another way.
> I have thousands of numpy arrays to flick through, could I just identify
> which arrays have NAN's and for now ignore the entire array. is there a
> simple way to do this?

Doing np.any(np.isnan(a)) for an array a should answer this exact question

David
questions anon | 6 Dec 2011 03:53
Picon

Re: ignore NAN in numpy.true_divide()

Thanks for responding. I have tried several ways of adding the command, one of which is:

        for i in TSFC:
                if N.any(N.isnan(TSFC)):
                        break
                else:
                        pass
but nothing is happening, is there some particular way I need to add this command? I have posted all below:

netCDF_list=[]

for dir in glob.glob(MainFolder + '*/01/')+ glob.glob(MainFolder + '*/02/')+ glob.glob(MainFolder + '*/12/'):
        #print dir
        for ncfile in glob.glob(dir + '*.nc'):
            netCDF_list.append(ncfile)

slice_counter=0
print netCDF_list
for filename in netCDF_list:
        ncfile=netCDF4.Dataset(filename)
        TSFC=ncfile.variables['T_SFC'][:]
        fillvalue=ncfile.variables['T_SFC']._FillValue
        TSFC=MA.masked_values(TSFC, fillvalue)
        for a in TSFC:
                if N.any(N.isnan(TSFC)):
                        break
                else:
                        pass

        for i in xrange(0,len(TSFC)-1,1):
                        slice_counter +=1
                #print slice_counter
                        try:
                                running_sum=N.add(running_sum, TSFC[i])
                        except NameError:
                                print "Initiating the running total of my variable..."
                                running_sum=N.array(TSFC[i])

TSFC_avg=N.true_divide(running_sum, slice_counter)
N.set_printoptions(threshold='nan')
print "the TSFC_avg is:", TSFC_avg




On Tue, Dec 6, 2011 at 9:45 AM, David Cournapeau <cournape <at> gmail.com> wrote:
On Mon, Dec 5, 2011 at 5:29 PM, questions anon <questions.anon <at> gmail.com> wrote:
> Maybe I am asking the wrong question or could go about this another way.
> I have thousands of numpy arrays to flick through, could I just identify
> which arrays have NAN's and for now ignore the entire array. is there a
> simple way to do this?

Doing np.any(np.isnan(a)) for an array a should answer this exact question

David
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Xavier Barthelemy | 5 Dec 2011 23:50
Picon

Re: ignore NAN in numpy.true_divide()

Hi, 
I don't know if it is the best choice, but this is what I do in my code:

for each slice:
  indexnonNaN=np.isfinite(SliceOf Toto)
  SliceOf TotoWithoutNan= SliceOf Toto [indexnonNaN]

and then perform all operation I want o on the last array.

i hope it does answer your question

Xavier


2011/12/6 questions anon <questions.anon <at> gmail.com>
Maybe I am asking the wrong question or could go about this another way.
I have thousands of numpy arrays to flick through, could I just identify which arrays have NAN's and for now ignore the entire array. is there a simple way to do this?
any feedback will be greatly appreciated.

On Thu, Dec 1, 2011 at 12:16 PM, questions anon <questions.anon <at> gmail.com> wrote:
I am trying to calculate the mean across many netcdf files. I cannot use numpy.mean because there are too many files to concatenate and I end up with a memory error. I have enabled the below code to do what I need but I have a few nan values in some of my arrays. Is there a way to ignore these somewhere in my code. I seem to face this problem often so I would love a command that ignores blanks in my array before I continue on to the next processing step.
Any feedback is greatly appreciated.


netCDF_list=[]
for dir in glob.glob(MainFolder + '*/01/')+ glob.glob(MainFolder + '*/02/')+ glob.glob(MainFolder + '*/12/'):
        for ncfile in glob.glob(dir + '*.nc'):
            netCDF_list.append(ncfile)

slice_counter=0
print netCDF_list

for filename in netCDF_list:
        ncfile=netCDF4.Dataset(filename)
        TSFC=ncfile.variables['T_SFC'][:]
        fillvalue=ncfile.variables['T_SFC']._FillValue
        TSFC=MA.masked_values(TSFC, fillvalue)
        for i in xrange(0,len(TSFC)-1,1):
                slice_counter +=1
                #print slice_counter
                try:
                        running_sum=N.add(running_sum, TSFC[i])
                except NameError:
                        print "Initiating the running total of my variable..."
                        running_sum=N.array(TSFC[i])

TSFC_avg=N.true_divide(running_sum, slice_counter)
N.set_printoptions(threshold='nan')
print "the TSFC_avg is:", TSFC_avg



_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion




--
 « Quand le gouvernement viole les droits du peuple, l'insurrection est, pour le peuple et pour chaque portion du peuple, le plus sacré des droits et le plus indispensable des devoirs »

Déclaration des droits de l'homme et du citoyen, article 35, 1793
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
questions anon | 6 Dec 2011 04:06
Picon

Re: ignore NAN in numpy.true_divide()

I have also tried Xavier's suggestion but only end up with one value as my average (instead of an array). I used:

        for a in TSFC:
                indexnonNaN=N.isfinite(a)
                SliceofTotoWithoutNan=a[indexnonNaN]
                print SliceofTotoWithoutNan
        TSFC=SliceofTotoWithoutNan

entire script:

netCDF_list=[]

for dir in glob.glob(MainFolder + '*/01/')+ glob.glob(MainFolder + '*/02/')+ glob.glob(MainFolder + '*/12/'):
        #print dir
        for ncfile in glob.glob(dir + '*.nc'):
            netCDF_list.append(ncfile)

slice_counter=0
print netCDF_list
for filename in netCDF_list:
        ncfile=netCDF4.Dataset(filename)
        TSFC=ncfile.variables['T_SFC'][:]
        fillvalue=ncfile.variables['T_SFC']._FillValue
        TSFC=MA.masked_values(TSFC, fillvalue)
        for a in TSFC:
                indexnonNaN=N.isfinite(a)
                SliceofTotoWithoutNan=a[indexnonNaN]
                print SliceofTotoWithoutNan
        TSFC=SliceofTotoWithoutNan

        for i in xrange(0,len(TSFC)-1,1):
                        slice_counter +=1
                #print slice_counter
                        try:
                                running_sum=N.add(running_sum, TSFC[i])
                        except NameError:
                                print "Initiating the running total of my variable..."
                                running_sum=N.array(TSFC[i])

TSFC_avg=N.true_divide(running_sum, slice_counter)
N.set_printoptions(threshold='nan')
print "the TSFC_avg is:", TSFC_avg




On Tue, Dec 6, 2011 at 9:50 AM, Xavier Barthelemy <xabart <at> gmail.com> wrote:
Hi, 
I don't know if it is the best choice, but this is what I do in my code:

for each slice:
  indexnonNaN=np.isfinite(SliceOf Toto)
  SliceOf TotoWithoutNan= SliceOf Toto [indexnonNaN]

and then perform all operation I want o on the last array.

i hope it does answer your question

Xavier


2011/12/6 questions anon <questions.anon <at> gmail.com>
Maybe I am asking the wrong question or could go about this another way.
I have thousands of numpy arrays to flick through, could I just identify which arrays have NAN's and for now ignore the entire array. is there a simple way to do this?
any feedback will be greatly appreciated.

On Thu, Dec 1, 2011 at 12:16 PM, questions anon <questions.anon <at> gmail.com> wrote:
I am trying to calculate the mean across many netcdf files. I cannot use numpy.mean because there are too many files to concatenate and I end up with a memory error. I have enabled the below code to do what I need but I have a few nan values in some of my arrays. Is there a way to ignore these somewhere in my code. I seem to face this problem often so I would love a command that ignores blanks in my array before I continue on to the next processing step.
Any feedback is greatly appreciated.


netCDF_list=[]
for dir in glob.glob(MainFolder + '*/01/')+ glob.glob(MainFolder + '*/02/')+ glob.glob(MainFolder + '*/12/'):
        for ncfile in glob.glob(dir + '*.nc'):
            netCDF_list.append(ncfile)

slice_counter=0
print netCDF_list

for filename in netCDF_list:
        ncfile=netCDF4.Dataset(filename)
        TSFC=ncfile.variables['T_SFC'][:]
        fillvalue=ncfile.variables['T_SFC']._FillValue
        TSFC=MA.masked_values(TSFC, fillvalue)
        for i in xrange(0,len(TSFC)-1,1):
                slice_counter +=1
                #print slice_counter
                try:
                        running_sum=N.add(running_sum, TSFC[i])
                except NameError:
                        print "Initiating the running total of my variable..."
                        running_sum=N.array(TSFC[i])

TSFC_avg=N.true_divide(running_sum, slice_counter)
N.set_printoptions(threshold='nan')
print "the TSFC_avg is:", TSFC_avg



_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion




--
 « Quand le gouvernement viole les droits du peuple, l'insurrection est, pour le peuple et pour chaque portion du peuple, le plus sacré des droits et le plus indispensable des devoirs »

Déclaration des droits de l'homme et du citoyen, article 35, 1793

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Xavier Barthelemy | 6 Dec 2011 04:31
Picon

Re: ignore NAN in numpy.true_divide()

Well, I would see  solutions:
1- to keep how your code is, withj a python list (you can stack numpy arrays if they have the same dimensions):

for filename in netCDF_list:
        ncfile=netCDF4.Dataset(filename)
        TSFC=ncfile.variables['T_SFC'][:]
        fillvalue=ncfile.variables['T_SFC']._FillValue
        TSFC=MA.masked_values(TSFC, fillvalue)
        TSFCWithOutNan=[]
        for a in TSFC:
                indexnonNaN=N.isfinite(a)
                SliceofTotoWithoutNan=a[indexnonNaN]
                print SliceofTotoWithoutNan
                TSFCWithOutNan .append( SliceofTotoWithoutNan )
        


        for i in xrange(0,len(TSFCWithOutNan  )-1,1):
                        slice_counter +=1
                #print slice_counter
                        try:
                                running_sum=N.add(running_sum, TSFCWithOutNan  [i])
                        except NameError:
                                print "Initiating the running total of my variable..."
                                running_sum=N.array(TSFCWithOutNan  [i])
...

or 2- everything in the same loop:
       
slice_counter  =0
        for a in TSFC:
                indexnonNaN=N.isfinite(a)
                SliceofTotoWithoutNan=a[indexnonNaN]
                slice_counter +=1
                #print slice_counter
                        try:
                                running_sum=N.add(running_sum, SliceofTotoWithoutNan )
                        except NameError:
                                print "Initiating the running total of my variable..."
                                running_sum=N.array( SliceofTotoWithoutNan )
TSFC_avg=N.true_divide(running_sum, slice_counter)
N.set_printoptions(threshold='nan')
print "the TSFC_avg is:", TSFC_avg

See if it works. it is just a rapid guess
Xavier

for dir in glob.glob(MainFolder + '*/01/')+ glob.glob(MainFolder + '*/02/')+ glob.glob(MainFolder + '*/12/'):
        #print dir

        for ncfile in glob.glob(dir + '*.nc'):
            netCDF_list.append(ncfile)

slice_counter=0
print netCDF_list
for filename in netCDF_list:
        ncfile=netCDF4.Dataset(filename)
        TSFC=ncfile.variables['T_SFC'][:]
        fillvalue=ncfile.variables['T_SFC']._FillValue
        TSFC=MA.masked_values(TSFC, fillvalue)
        for a in TSFC:
                indexnonNaN=N.isfinite(a)
                SliceofTotoWithoutNan=a[indexnonNaN]
                print SliceofTotoWithoutNan
        TSFC=SliceofTotoWithoutNan


        for i in xrange(0,len(TSFC)-1,1):
                        slice_counter +=1
                #print slice_counter
                        try:
                                running_sum=N.add(running_sum, TSFC[i])
                        except NameError:
                                print "Initiating the running total of my variable..."
                                running_sum=N.array(TSFC[i])

TSFC_avg=N.true_divide(running_sum, slice_counter)
N.set_printoptions(threshold='nan')
print "the TSFC_avg is:", TSFC_avg




On Tue, Dec 6, 2011 at 9:50 AM, Xavier Barthelemy <xabart <at> gmail.com> wrote:
Hi, 
I don't know if it is the best choice, but this is what I do in my code:

for each slice:
  indexnonNaN=np.isfinite(SliceOf Toto)
  SliceOf TotoWithoutNan= SliceOf Toto [indexnonNaN]

and then perform all operation I want o on the last array.

i hope it does answer your question

Xavier


2011/12/6 questions anon <questions.anon <at> gmail.com>
Maybe I am asking the wrong question or could go about this another way.
I have thousands of numpy arrays to flick through, could I just identify which arrays have NAN's and for now ignore the entire array. is there a simple way to do this?
any feedback will be greatly appreciated.

On Thu, Dec 1, 2011 at 12:16 PM, questions anon <questions.anon <at> gmail.com> wrote:
I am trying to calculate the mean across many netcdf files. I cannot use numpy.mean because there are too many files to concatenate and I end up with a memory error. I have enabled the below code to do what I need but I have a few nan values in some of my arrays. Is there a way to ignore these somewhere in my code. I seem to face this problem often so I would love a command that ignores blanks in my array before I continue on to the next processing step.
Any feedback is greatly appreciated.


netCDF_list=[]
for dir in glob.glob(MainFolder + '*/01/')+ glob.glob(MainFolder + '*/02/')+ glob.glob(MainFolder + '*/12/'):
        for ncfile in glob.glob(dir + '*.nc'):
            netCDF_list.append(ncfile)

slice_counter=0
print netCDF_list

for filename in netCDF_list:
        ncfile=netCDF4.Dataset(filename)
        TSFC=ncfile.variables['T_SFC'][:]
        fillvalue=ncfile.variables['T_SFC']._FillValue
        TSFC=MA.masked_values(TSFC, fillvalue)
        for i in xrange(0,len(TSFC)-1,1):
                slice_counter +=1
                #print slice_counter
                try:
                        running_sum=N.add(running_sum, TSFC[i])
                except NameError:
                        print "Initiating the running total of my variable..."
                        running_sum=N.array(TSFC[i])

TSFC_avg=N.true_divide(running_sum, slice_counter)
N.set_printoptions(threshold='nan')
print "the TSFC_avg is:", TSFC_avg



_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion




--
 « Quand le gouvernement viole les droits du peuple, l'insurrection est, pour le peuple et pour chaque portion du peuple, le plus sacré des droits et le plus indispensable des devoirs »

Déclaration des droits de l'homme et du citoyen, article 35, 1793

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion



_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion




--
 « Quand le gouvernement viole les droits du peuple, l'insurrection est, pour le peuple et pour chaque portion du peuple, le plus sacré des droits et le plus indispensable des devoirs »

Déclaration des droits de l'homme et du citoyen, article 35, 1793
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
questions anon | 6 Dec 2011 05:27
Picon

Re: ignore NAN in numpy.true_divide()

thanks again for you response. I must still be doing something wrong!!
both options resulted in :
the TSFC_avg is: [-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --

1st option:

slice_counter=0

for filename in netCDF_list:
        ncfile=netCDF4.Dataset(filename)
        TSFC=ncfile.variables['T_SFC'][:]
        fillvalue=ncfile.variables['T_SFC']._FillValue
        TSFC=MA.masked_values(TSFC, fillvalue)
        TSFCWithOutNan=[]
        for a in TSFC:
                indexnonNaN=N.isfinite(a)
                SliceofTotoWithoutNan=a[indexnonNaN]
                print SliceofTotoWithoutNan
                TSFCWithOutNan.append(SliceofTotoWithoutNan)
        for i in xrange(0,len(TSFCWithOutNan)-1,1):
                slice_counter +=1
                try:
                        running_sum=N.add(running_sum, TSFCWithOutNan[i])
                except NameError:
                        print "Initiating the running total of my variable..."
                        running_sum=N.array(TSFCWithOutNan[i])

TSFC_avg=N.true_divide(running_sum, slice_counter)
N.set_printoptions(threshold='nan')
print "the TSFC_avg is:", TSFC_avg



the 2nd option :

for filename in netCDF_list:
        ncfile=netCDF4.Dataset(filename)
        TSFC=ncfile.variables['T_SFC'][:]
        fillvalue=ncfile.variables['T_SFC']._FillValue
        TSFC=MA.masked_values(TSFC, fillvalue)

        slice_counter=0
        for a in TSFC:
                indexnonNaN=N.isfinite(a)
                SliceofTotoWithoutNan=a[indexnonNaN]
                slice_counter +=1
                try:
                        running_sum=N.add(running_sum, SliceofTotoWithoutNan)
                except NameError:
                         print "Initiating the running total of my variable..."
                         running_sum=N.array(SliceofTotoWithoutNan)

TSFC_avg=N.true_divide(running_sum, slice_counter)
N.set_printoptions(threshold='nan')
print "the TSFC_avg is:", TSFC_avg





On Tue, Dec 6, 2011 at 2:31 PM, Xavier Barthelemy <xabart <at> gmail.com> wrote:
Well, I would see  solutions:
1- to keep how your code is, withj a python list (you can stack numpy arrays if they have the same dimensions):

for filename in netCDF_list:
        ncfile=netCDF4.Dataset(filename)
        TSFC=ncfile.variables['T_SFC'][:]
        fillvalue=ncfile.variables['T_SFC']._FillValue
        TSFC=MA.masked_values(TSFC, fillvalue)
        TSFCWithOutNan=[]
        for a in TSFC:
                indexnonNaN=N.isfinite(a)
                SliceofTotoWithoutNan=a[indexnonNaN]
                print SliceofTotoWithoutNan
                TSFCWithOutNan .append( SliceofTotoWithoutNan )
        


        for i in xrange(0,len(TSFCWithOutNan  )-1,1):

                        slice_counter +=1
                #print slice_counter
                        try:
                                running_sum=N.add(running_sum, TSFCWithOutNan  [i])

                        except NameError:
                                print "Initiating the running total of my variable..."
                                running_sum=N.array(TSFCWithOutNan  [i])
...

or 2- everything in the same loop:
       
slice_counter  =0
        for a in TSFC:
                indexnonNaN=N.isfinite(a)
                SliceofTotoWithoutNan=a[indexnonNaN]
                slice_counter +=1
                #print slice_counter
                        try:
                                running_sum=N.add(running_sum, SliceofTotoWithoutNan )

                        except NameError:
                                print "Initiating the running total of my variable..."
                                running_sum=N.array( SliceofTotoWithoutNan )
TSFC_avg=N.true_divide(running_sum, slice_counter)
N.set_printoptions(threshold='nan')
print "the TSFC_avg is:", TSFC_avg

See if it works. it is just a rapid guess
Xavier


for dir in glob.glob(MainFolder + '*/01/')+ glob.glob(MainFolder + '*/02/')+ glob.glob(MainFolder + '*/12/'):
        #print dir

        for ncfile in glob.glob(dir + '*.nc'):
            netCDF_list.append(ncfile)

slice_counter=0
print netCDF_list
for filename in netCDF_list:
        ncfile=netCDF4.Dataset(filename)
        TSFC=ncfile.variables['T_SFC'][:]
        fillvalue=ncfile.variables['T_SFC']._FillValue
        TSFC=MA.masked_values(TSFC, fillvalue)
        for a in TSFC:
                indexnonNaN=N.isfinite(a)
                SliceofTotoWithoutNan=a[indexnonNaN]
                print SliceofTotoWithoutNan
        TSFC=SliceofTotoWithoutNan


        for i in xrange(0,len(TSFC)-1,1):
                        slice_counter +=1
                #print slice_counter
                        try:
                                running_sum=N.add(running_sum, TSFC[i])
                        except NameError:
                                print "Initiating the running total of my variable..."
                                running_sum=N.array(TSFC[i])

TSFC_avg=N.true_divide(running_sum, slice_counter)
N.set_printoptions(threshold='nan')
print "the TSFC_avg is:", TSFC_avg




On Tue, Dec 6, 2011 at 9:50 AM, Xavier Barthelemy <xabart <at> gmail.com> wrote:
Hi, 
I don't know if it is the best choice, but this is what I do in my code:

for each slice:
  indexnonNaN=np.isfinite(SliceOf Toto)
  SliceOf TotoWithoutNan= SliceOf Toto [indexnonNaN]

and then perform all operation I want o on the last array.

i hope it does answer your question

Xavier


2011/12/6 questions anon <questions.anon <at> gmail.com>
Maybe I am asking the wrong question or could go about this another way.
I have thousands of numpy arrays to flick through, could I just identify which arrays have NAN's and for now ignore the entire array. is there a simple way to do this?
any feedback will be greatly appreciated.

On Thu, Dec 1, 2011 at 12:16 PM, questions anon <questions.anon <at> gmail.com> wrote:
I am trying to calculate the mean across many netcdf files. I cannot use numpy.mean because there are too many files to concatenate and I end up with a memory error. I have enabled the below code to do what I need but I have a few nan values in some of my arrays. Is there a way to ignore these somewhere in my code. I seem to face this problem often so I would love a command that ignores blanks in my array before I continue on to the next processing step.
Any feedback is greatly appreciated.


netCDF_list=[]
for dir in glob.glob(MainFolder + '*/01/')+ glob.glob(MainFolder + '*/02/')+ glob.glob(MainFolder + '*/12/'):
        for ncfile in glob.glob(dir + '*.nc'):
            netCDF_list.append(ncfile)

slice_counter=0
print netCDF_list

for filename in netCDF_list:
        ncfile=netCDF4.Dataset(filename)
        TSFC=ncfile.variables['T_SFC'][:]
        fillvalue=ncfile.variables['T_SFC']._FillValue
        TSFC=MA.masked_values(TSFC, fillvalue)
        for i in xrange(0,len(TSFC)-1,1):
                slice_counter +=1
                #print slice_counter
                try:
                        running_sum=N.add(running_sum, TSFC[i])
                except NameError:
                        print "Initiating the running total of my variable..."
                        running_sum=N.array(TSFC[i])

TSFC_avg=N.true_divide(running_sum, slice_counter)
N.set_printoptions(threshold='nan')
print "the TSFC_avg is:", TSFC_avg



_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion




--
 « Quand le gouvernement viole les droits du peuple, l'insurrection est, pour le peuple et pour chaque portion du peuple, le plus sacré des droits et le plus indispensable des devoirs »

Déclaration des droits de l'homme et du citoyen, article 35, 1793

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion



_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion




--
 « Quand le gouvernement viole les droits du peuple, l'insurrection est, pour le peuple et pour chaque portion du peuple, le plus sacré des droits et le plus indispensable des devoirs »

Déclaration des droits de l'homme et du citoyen, article 35, 1793

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion <at> scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Gmane