Using get_loc to get index of multiple values by iterating over a dataframe in Pandas | Coursera Community

Using get_loc to get index of multiple values by iterating over a dataframe in Pandas

  • 22 July 2020
  • 1 reply
  • 1419 views

Badge +1

`Events` is the DataFrame with date as index. It looks like this:

           

  co_code
    co_stkdate    
    2009-03-17    11
    2010-02-03    11
    2011-02-14    363
    2015-01-09    363
    2010-10-15    365


`residual` is the other dataframe with date as index and contains the elements in co_code of `events` dataframe as the column names. residual looks like this (has more than 700 columns but i've posted 3 for reference):

       

                   11    363       365    
    co_stkdate
    1997-07-02    NaN    -12.134525    NaN 
    1997-07-04    NaN    -3.663248    -15.703843 
    1997-07-07    NaN    -30.649876    3.400623
    1997-07-08    NaN    17.924305    -6.188777
    1997-07-10    NaN    -25.828099    -0.615380 

      
I want to compare the two dataframes to find the common dates for each column of `residual` dataframe individually and extract the specific row and its adjacent rows for each column which has a matching date in `events` dataframe. Since the dataset is very large, I want to iterate through each column of residual to compare the date in accordance with the column name (that matches with the `events` dataframe). Hence, I tried the following code:

carvalues = {}
    for code in residual.columns:
      for c in events['co_code']:
        if (code == c):
          for elem in events['co_stkdate']:
            for dates in residual.index:
              if (elem == dates):
                if pd.notnull(residual.loc[dates, code]):
                  idx=residual.index.get_loc(dates, code. method=None)
                  carvalues = residual.iloc[idx - 10 : idx +10] 


But I keep getting the following error:

TypeError: get_loc() got multiple values for argument 'method'

The expected output: For example, extract 10 rows (from the `residual` dataframe) above and below the date 2009-03-17 corresponding to 'co_code'=11 (given in `events` dataframe). And expect the output for date 2009-03-17, corresponding to 'co_code'=11 to be:

   

    co_stkdate  11
    2009-02-25  4.467442
    2009-02-26  4.921655
    2009-02-27  -4.875917
    2009-03-02  1.895546
    2009-03-03  -3.162370
    2009-03-06  85.396542
    2009-03-09  43.233098
    2009-03-12  11.389193
    2009-03-13  -68.633160
    2009-03-16  0.329175
    2009-03-17  -0.049623
    2009-03-18  3.584602
    2009-03-19  -3.602577
    2009-03-20  -1.532591
    2009-03-23  2.766331
    2009-03-24  0.487590
    2009-03-25  -3.541044
    2009-03-26  -5.055355
    2009-03-27  0.887624
    2009-03-30  2.530087

Similarly, next I want the output for `co_stkdate=2010-02-03`  & `co_code=11` and then for `co_stkdate=2011-02-14` & `co_code=363` and so on (as given in `events` dataframe). How can I remove the error? Any guidance on the best way to do this would be much appreciated.


1 reply

Badge

Iterating through pandas dataFrame objects is generally slow. Pandas iteration beats the whole purpose of using DataFrame. It is an anti-pattern and is something you should only do when you have exhausted every other option. It is better look for a List Comprehensions , vectorized solution or DataFrame.apply() method.

Pandas DataFrame loop using list comprehension

result = [(x, y,z) for x, y,z in zip(df['Name'], df['Promoted'],df['Grade'])]

 

Reply