Dowemo
0 0 0 0


Question:

I have a DataFrames from pandas:

import pandas as pd
inp = [{'c1':10, 'c2':100}, {'c1':11,'c2':110}, {'c1':12,'c2':120}]
df = pd.DataFrame(inp)
print df

Output:

   c1   c2
0  10  100
1  11  110
2  12  120

Now I want to iterate over the rows of the above frame. For every row I want to be able to access its elements (values in cells) by the name of the columns. So, for example, I would like to have something like that:

for row in df.rows:
   print row['c1'], row['c2']

Is it possible to do that in pandas?

I found similar question. But it does not give me the answer I need. For example, it is suggested there to use:

for date, row in df.T.iteritems():

or

for row in df.iterrows():

But I do not understand what the row object is and how I can work with it.


Best Answer:


To iterate through DataFrame's row in pandas one can use:

itertuples() is supposed to be faster than iterrows()

But be aware, according to the docs (pandas 0.21.1 at the moment):

  • iterrows: dtype might not match from row to row

    Because iterrows returns a Series for each row, it does not preserve dtypes across the rows (dtypes are preserved across columns for DataFrames).

  • iterrows: Do not modify rows

    You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.

    Use DataFrame.apply() instead:

    new_df = df.apply(lambda x: x * 2)
  • itertuples:

    The column names will be renamed to positional names if they are invalid Python identifiers, repeated, or start with an underscore. With a large number of columns (>255), regular tuples are returned.




Copyright © 2011 Dowemo All rights reserved.    Creative Commons   AboutUs