Notice
Recent Posts
Recent Comments
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 |
8 | 9 | 10 | 11 | 12 | 13 | 14 |
15 | 16 | 17 | 18 | 19 | 20 | 21 |
22 | 23 | 24 | 25 | 26 | 27 | 28 |
29 | 30 | 31 |
Tags
- github
- AI 경진대회
- 금융문자분석경진대회
- Docker
- programmers
- 프로그래머스 파이썬
- SW Expert Academy
- ubuntu
- Real or Not? NLP with Disaster Tweets
- 캐치카페
- gs25
- 편스토랑 우승상품
- 백준
- 맥북
- Kaggle
- 프로그래머스
- hackerrank
- 자연어처리
- 코로나19
- PYTHON
- 파이썬
- 편스토랑
- dacon
- 우분투
- 더현대서울 맛집
- ChatGPT
- Baekjoon
- 데이콘
- Git
- leetcode
Archives
- Today
- Total
솜씨좋은장씨
[Python] Pandas ValueError: cannot reindex from a duplicate axis 원인과 해결방법 본문
Programming/Python
[Python] Pandas ValueError: cannot reindex from a duplicate axis 원인과 해결방법
솜씨좋은장씨 2020. 11. 9. 14:42728x90
반응형
pandas를 활용하여 읽어온 csv 파일에서 특정 column에 대한 중복값을 제거하기 위하여
아래와 같이 먼저 파일을 엑셀에서 읽어온 후
import pandas as pd
data = pd.read_excel("./42_nia_pqa_QueryTemplate.xlsx")
data_fillna = data.fillna(0.0)
data_fillna_dropna = data_fillna[data_fillna['property'] != 0.0]
data_fillna_dropna.head()
특정 값 이외의 값만 남기기 위하여 위와 같이 코드를 실행하였으나
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-25-95f7c3a3bb77> in <module>
----> 1 data_fillna_dropna = data_fillna[data_fillna['property'] != 0.0]
2 data_fillna_dropna.head()
~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
2912 # Do we have a (boolean) DataFrame?
2913 if isinstance(key, DataFrame):
-> 2914 return self._getitem_frame(key)
2915
2916 # Do we have a (boolean) 1d indexer?
~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in _getitem_frame(self, key)
3008 if key.values.size and not is_bool_dtype(key.values):
3009 raise ValueError('Must pass DataFrame with boolean values only')
-> 3010 return self.where(key)
3011
3012 def query(self, expr, inplace=False, **kwargs):
~/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in where(self, cond, other, inplace, axis, level, errors, try_cast, raise_on_error)
8832 other = com.apply_if_callable(other, self)
8833 return self._where(cond, other, inplace, axis, level,
-> 8834 errors=errors, try_cast=try_cast)
8835
8836 @Appender(_shared_docs['where'] % dict(_shared_doc_kwargs, cond="False",
~/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in _where(self, cond, other, inplace, axis, level, errors, try_cast)
8686 errors=errors,
8687 try_cast=try_cast, axis=block_axis,
-> 8688 transpose=self._AXIS_REVERSED)
8689
8690 return self._constructor(new_data).__finalize__(self)
~/anaconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py in where(self, **kwargs)
505
506 def where(self, **kwargs):
--> 507 return self.apply('where', **kwargs)
508
509 def setitem(self, **kwargs):
~/anaconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
391 axis = getattr(obj, '_info_axis_number', 0)
392 kwargs[k] = obj.reindex(b_items, axis=axis,
--> 393 copy=align_copy)
394
395 applied = getattr(b, f)(**kwargs)
~/anaconda3/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
195 @wraps(func)
196 def wrapper(*args, **kwargs):
--> 197 return func(*args, **kwargs)
198
199 if not PY2:
~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in reindex(self, *args, **kwargs)
3807 kwargs.pop('axis', None)
3808 kwargs.pop('labels', None)
-> 3809 return super(DataFrame, self).reindex(**kwargs)
3810
3811 @Appender(_shared_docs['reindex_axis'] % _shared_doc_kwargs)
~/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in reindex(self, *args, **kwargs)
4354 # perform the reindex on the axes
4355 return self._reindex_axes(axes, level, limit, tolerance, method,
-> 4356 fill_value, copy).__finalize__(self)
4357
4358 def _reindex_axes(self, axes, level, limit, tolerance, method, fill_value,
~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
3734 if columns is not None:
3735 frame = frame._reindex_columns(columns, method, copy, level,
-> 3736 fill_value, limit, tolerance)
3737
3738 index = axes['index']
~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in _reindex_columns(self, new_columns, method, copy, level, fill_value, limit, tolerance)
3759 return self._reindex_with_indexers({1: [new_columns, indexer]},
3760 copy=copy, fill_value=fill_value,
-> 3761 allow_dups=False)
3762
3763 def _reindex_multi(self, axes, copy, fill_value):
~/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in _reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups)
4488 fill_value=fill_value,
4489 allow_dups=allow_dups,
-> 4490 copy=copy)
4491
4492 if copy and new_data is self._data:
~/anaconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy)
1222 # some axes don't allow reindexing with dups
1223 if not allow_dups:
-> 1224 self.axes[axis]._can_reindex(indexer)
1225
1226 if axis >= self.ndim:
~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in _can_reindex(self, indexer)
3085 # trying to reindex on an axis with duplicates
3086 if not self.is_unique and len(indexer):
-> 3087 raise ValueError("cannot reindex from a duplicate axis")
3088
3089 def reindex(self, target, method=None, level=None, limit=None,
ValueError: cannot reindex from a duplicate axis
위와 같은 오류가 발생하였습니다.
원인
데이터를 확인해보니 중복된 column이 존재하였고 이 중복된 column으로 인하여 발생하는 오류였습니다.
해결 방법
해결 방법은 중복된 두 column 중에 하나의 column 이름을 변경해주면 됩니다.
columns = list(data_fillna_dropna.columns)
columns[3] = "property_not_use"
data_fillna_dropna.columns = columns
방법은 정말 다양하게 존재하지만 저는 기존 columns 를 불러와서 중복된 column 중 활용하지 않는 column 만
이름을 변경해 보았습니다.
이렇게 변경하고 나면 정상적으로 동작하는 것을 확인할 수 있을 겁니다.
읽어주셔서 감사합니다.
'Programming > Python' 카테고리의 다른 글
Comments