[Tutor] group txt files by month

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[Tutor] group txt files by month

questions anon
I think what I am trying to do is relatively easy but can't get my head around how to do it.
I have a list of txt files that contain daily rainfall for many years. I would like to produce a list that contains the year-month and the max, min and mean of rainfall for each month.
My main question at this stage is how to group the files by each month for each year?
They are set out like:
r20110101.txt
r20110102.txt
r20110103.txt
r20110104.txt
r20110105.txt
r20110106.txt
r20110107.txt
r20110108.txt
r20110109.txt
r20110110.txt
r20110111.txt
r20110112.txt
r20110113.txt
r20110114.txt
r20110115.txt
r20110116.txt
r20110117.txt
r20110118.txt

and so on for each day for many years.

so far I am able to open each file and calculate the max, min and mean for each file (see below) but not sure about grouping to monthly for each year.

MainFolder=r"E:/Rainfalldata/"
outputFolder=r"E:/test/"
for (path, dirs, files) in os.walk(MainFolder):
    path=path+'/'
    for fname in files:
        if fname.endswith('.txt'):
            filename=path+fname
            f=np.genfromtxt(filename, skip_header=6)
            print f.max(), f.min(), f.mean()

the ideal output would be something like:
year-month      max   min   mean
2010-12         100      0      50
2011-01          200     0      100
2011-02          50       0      25


any feedback will be greatly appreciated.



_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
Reply | Threaded
Open this post in threaded view
|

Re: [Tutor] group txt files by month

Asokan Pichai-2
On Tue, Apr 3, 2012 at 9:29 AM, questions anon <[hidden email]> wrote:

> I think what I am trying to do is relatively easy but can't get my head
> around how to do it.
> I have a list of txt files that contain daily rainfall for many years. I
> would like to produce a list that contains the year-month and the max, min
> and mean of rainfall for each month.
> My main question at this stage is how to group the files by each month for
> each year?
> They are set out like:
> r20110101.txt
> r20110102.txt
> r20110103.txt
> r20110104.txt
> r20110105.txt
> r20110106.txt
> r20110107.txt
> r20110108.txt
> r20110109.txt
> r20110110.txt
> r20110111.txt
> r20110112.txt
> r20110113.txt
> r20110114.txt
> r20110115.txt
> r20110116.txt
> r20110117.txt
> r20110118.txt
>
> and so on for each day for many years.
>
> so far I am able to open each file and calculate the max, min and mean for
> each file (see below) but not sure about grouping to monthly for each year.

# ---------------------
Monthwise = {}
# ----------------------

> MainFolder=r"E:/Rainfalldata/"
> outputFolder=r"E:/test/"
> for (path, dirs, files) in os.walk(MainFolder):
>     path=path+'/'
>     for fname in files:
>         if fname.endswith('.txt'):
>             filename=path+fname
>             f=np.genfromtxt(filename, skip_header=6)
>             print f.max(), f.min(), f.mean()

Replace the last two lines with
# --------------------------
               Monthwise[fname[1:7]] = .np.genfromtxt(filename, skip_header=6)
# -------------------------

Now at the end you have a dictionary whose keys are the strings of type '201012'
and the values are the f.

You can now iterate over the sorted keys of Monthwise
and print appropriately

[Ideal output etc SNIPPED]

HTH

Asokan Pichai

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco
_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
Reply | Threaded
Open this post in threaded view
|

Re: [Tutor] group txt files by month

Alan Gauld
In reply to this post by questions anon
On 03/04/12 04:59, questions anon wrote:

> I have a list of txt files that contain daily rainfall for many years.
> They are set out like:
> r20110101.txt
> r20110102.txt
> r20110103.txt
> and so on for each day for many years.
>
> MainFolder=r"E:/Rainfalldata/"
> outputFolder=r"E:/test/"
> for (path, dirs, files) in os.walk(MainFolder):

If the files are all in a single folder you might be better using
glob.glob() rather than os.walk. You can pass a filename pattern
like *.txt to glob(). This might make it easier to group the
files by year... 2010*.txt for example.

You can do it with walk too its just a bit more effort. But if the files
are in multiple folders walk() is probably  better.

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/

_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
Reply | Threaded
Open this post in threaded view
|

Re: [Tutor] group txt files by month

questions anon
thanks for responding.
Glob and os.walk will work but I would need to type up a separate command for each month of each year and that doesn't seem very efficient. Is there a way to make it go through and group txt files with similar filenames
e.g something like:
if fname.endswith('.txt')and fname[0:7]==fname[0:7]
e.g. r20110101.txt and r20110102.txt should go together but r20110601 should not.
thanks

On Tue, Apr 3, 2012 at 4:59 PM, Alan Gauld <[hidden email]> wrote:
On 03/04/12 04:59, questions anon wrote:

I have a list of txt files that contain daily rainfall for many years.
They are set out like:
r20110101.txt
r20110102.txt
r20110103.txt
and so on for each day for many years.

MainFolder=r"E:/Rainfalldata/"
outputFolder=r"E:/test/"
for (path, dirs, files) in os.walk(MainFolder):

If the files are all in a single folder you might be better using glob.glob() rather than os.walk. You can pass a filename pattern
like *.txt to glob(). This might make it easier to group the
files by year... 2010*.txt for example.

You can do it with walk too its just a bit more effort. But if the files are in multiple folders walk() is probably  better.

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/

_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
Reply | Threaded
Open this post in threaded view
|

Re: [Tutor] group txt files by month

questions anon
I have been able to write up what I want to do (using glob) but I am not sure how to loop it or simplify it to make the script more efficient.
I am currently:
-grouping the same months in a year using glob
-opening the files in a group and combining the data using a list
-finding max, min etc for the list and printing it

I need to do this for many years and therefore many months so really need a way to make this more efficient.
Any feedback will be greatly appreciated

MainFolder=r"E:/rainfall-2011/"
OutputFolder=r"E:/test_out/"
r201101=glob.glob(MainFolder+"r201101??.txt")
r201102=glob.glob(MainFolder+"r201102??.txt")
r201103=glob.glob(MainFolder+"r201103??.txt")

rain201101=[]
rain201102=[]
rain201103=[]
monthlyrainfall=[]

for ifile in r201101:
    f=np.genfromtxt(ifile, skip_header=6)
    rain201101.append(f)

for ifile in r201102:
    f=np.genfromtxt(ifile, skip_header=6)
    rain201102.append(f)

for ifile in r201103:
    f=np.genfromtxt(ifile, skip_header=6)
    rain201103.append(f)
   
print "jan", np.max(rain201101), np.min(rain201101), np.mean(rain201101), np.median(rain201101), np.std(rain201101)
print "feb", np.max(rain201102), np.min(rain201102), np.mean(rain201102), np.median(rain201102), np.std(rain201102)
print "mar", np.max(rain201103), np.min(rain201103), np.mean(rain201103), np.median(rain201103), np.std(rain201103)


On Thu, Apr 5, 2012 at 11:11 AM, questions anon <[hidden email]> wrote:
thanks for responding.
Glob and os.walk will work but I would need to type up a separate command for each month of each year and that doesn't seem very efficient. Is there a way to make it go through and group txt files with similar filenames
e.g something like:
if fname.endswith('.txt')and fname[0:7]==fname[0:7]
e.g. r20110101.txt and r20110102.txt should go together but r20110601 should not.
thanks


On Tue, Apr 3, 2012 at 4:59 PM, Alan Gauld <[hidden email]> wrote:
On 03/04/12 04:59, questions anon wrote:

I have a list of txt files that contain daily rainfall for many years.
They are set out like:
r20110101.txt
r20110102.txt
r20110103.txt
and so on for each day for many years.

MainFolder=r"E:/Rainfalldata/"
outputFolder=r"E:/test/"
for (path, dirs, files) in os.walk(MainFolder):

If the files are all in a single folder you might be better using glob.glob() rather than os.walk. You can pass a filename pattern
like *.txt to glob(). This might make it easier to group the
files by year... 2010*.txt for example.

You can do it with walk too its just a bit more effort. But if the files are in multiple folders walk() is probably  better.

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/

_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor



_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
Reply | Threaded
Open this post in threaded view
|

Re: [Tutor] group txt files by month

Peter Otten
questions anon wrote:

> I have been able to write up what I want to do (using glob) but I am not
> sure how to loop it or simplify it to make the script more efficient.
> I am currently:
> -grouping the same months in a year using glob
> -opening the files in a group and combining the data using a list
> -finding max, min etc for the list and printing it
>
> I need to do this for many years and therefore many months so really need
> a way to make this more efficient.
> Any feedback will be greatly appreciated
>
> MainFolder=r"E:/rainfall-2011/"
> OutputFolder=r"E:/test_out/"
> r201101=glob.glob(MainFolder+"r201101??.txt")
> r201102=glob.glob(MainFolder+"r201102??.txt")
> r201103=glob.glob(MainFolder+"r201103??.txt")
>
> rain201101=[]
> rain201102=[]
> rain201103=[]
> monthlyrainfall=[]
>
> for ifile in r201101:
>     f=np.genfromtxt(ifile, skip_header=6)
>     rain201101.append(f)
>
> for ifile in r201102:
>     f=np.genfromtxt(ifile, skip_header=6)
>     rain201102.append(f)
>
> for ifile in r201103:
>     f=np.genfromtxt(ifile, skip_header=6)
>     rain201103.append(f)
>
> print "jan", np.max(rain201101), np.min(rain201101), np.mean(rain201101),
> np.median(rain201101), np.std(rain201101)
> print "feb", np.max(rain201102), np.min(rain201102), np.mean(rain201102),
> np.median(rain201102), np.std(rain201102)
> print "mar", np.max(rain201103), np.min(rain201103), np.mean(rain201103),
> np.median(rain201103), np.std(rain201103)

Strip the code down to one month

> r201103=glob.glob(MainFolder+"r201103??.txt")
> rain201101=[]
> for ifile in r201101:
>     f=np.genfromtxt(ifile, skip_header=6)
>     rain201101.append(f)


then turn it into a function, roughly

GLOBTEMPLATE = "e:/rainfall-{year}/r{year}{month:02}??.txt"
def accumulate_month(year, month):
    files = glob.glob(GLOBTEMPLATE.format(year=year, month=month))
    # read files, caculate and write stats

and finally put it into a loop:

from datetime import date, timedelta
stop_month = date(2012, 4, 1)
month = datetime(2011, 1, 1)
while month < stop_month:
    accumulate_month(month.year, month.month)
    month += timedelta(days=32)
    month = month.replace(day=1)




_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
Reply | Threaded
Open this post in threaded view
|

Re: [Tutor] group txt files by month

questions anon
Thank you for this response it was a tremedous help.
It still took me awhile to work it all out and thought I would post what worked for me.
Thanks again

GLOBTEMPLATE = r"e:/rainfall-{year}/r{year}{month:02}??.txt"

def accumulate_month(year, month):
    files = glob.glob(GLOBTEMPLATE.format(year=year, month=month))
    monthlyrain=[]
    for ifile in files:
        f=np.genfromtxt(ifile,skip_header=6)
        monthlyrain.append(f)
    print "year-month: ",year,"-",month, ", maximum: ", np.max(monthlyrain), "minimum: ", np.min(monthlyrain), "mean: ", np.mean(monthlyrain)

stop_month = datetime(2011, 12, 31)
month = datetime(2011, 01, 01)
while month < stop_month:
    accumulate_month(month.year, month.month)
    month += timedelta(days=32)
    month = month.replace(day=01)


On Thu, Apr 5, 2012 at 4:57 PM, Peter Otten <__[hidden email]> wrote:
questions anon wrote:

> I have been able to write up what I want to do (using glob) but I am not
> sure how to loop it or simplify it to make the script more efficient.
> I am currently:
> -grouping the same months in a year using glob
> -opening the files in a group and combining the data using a list
> -finding max, min etc for the list and printing it
>
> I need to do this for many years and therefore many months so really need
> a way to make this more efficient.
> Any feedback will be greatly appreciated
>
> MainFolder=r"E:/rainfall-2011/"
> OutputFolder=r"E:/test_out/"
> r201101=glob.glob(MainFolder+"r201101??.txt")
> r201102=glob.glob(MainFolder+"r201102??.txt")
> r201103=glob.glob(MainFolder+"r201103??.txt")
>
> rain201101=[]
> rain201102=[]
> rain201103=[]
> monthlyrainfall=[]
>
> for ifile in r201101:
>     f=np.genfromtxt(ifile, skip_header=6)
>     rain201101.append(f)
>
> for ifile in r201102:
>     f=np.genfromtxt(ifile, skip_header=6)
>     rain201102.append(f)
>
> for ifile in r201103:
>     f=np.genfromtxt(ifile, skip_header=6)
>     rain201103.append(f)
>
> print "jan", np.max(rain201101), np.min(rain201101), np.mean(rain201101),
> np.median(rain201101), np.std(rain201101)
> print "feb", np.max(rain201102), np.min(rain201102), np.mean(rain201102),
> np.median(rain201102), np.std(rain201102)
> print "mar", np.max(rain201103), np.min(rain201103), np.mean(rain201103),
> np.median(rain201103), np.std(rain201103)

Strip the code down to one month

> r201103=glob.glob(MainFolder+"r201103??.txt")
> rain201101=[]
> for ifile in r201101:
>     f=np.genfromtxt(ifile, skip_header=6)
>     rain201101.append(f)


then turn it into a function, roughly

GLOBTEMPLATE = "e:/rainfall-{year}/r{year}{month:02}??.txt"
def accumulate_month(year, month):
   files = glob.glob(GLOBTEMPLATE.format(year=year, month=month))
   # read files, caculate and write stats

and finally put it into a loop:

from datetime import date, timedelta
stop_month = date(2012, 4, 1)
month = datetime(2011, 1, 1)
while month < stop_month:
   accumulate_month(month.year, month.month)
   month += timedelta(days=32)
   month = month.replace(day=1)




_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


_______________________________________________
Tutor maillist  -  [hidden email]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor