pyspark.pandas.extensions.register_series_accessor#
- pyspark.pandas.extensions.register_series_accessor(name)[source]#
- Register a custom accessor with a Series object - Parameters
- namestr
- name used when calling the accessor after its registered 
 
- Returns
- callable
- A class decorator. 
 
 - See also - register_dataframe_accessor
- Register a custom accessor on DataFrame objects 
- register_index_accessor
- Register a custom accessor on Index objects 
 - Notes - When accessed, your accessor will be initialized with the pandas-on-Spark object the user is interacting with. The code signature must be: - def __init__(self, pandas_on_spark_obj): # constructor logic ... - In the pandas API, if data passed to your accessor has an incorrect dtype, it’s recommended to raise an - AttributeErrorfor consistency purposes. In pandas-on-Spark,- ValueErroris more frequently used to annotate when a value’s datatype is unexpected for a given method/function.- Ultimately, you can structure this however you like, but pandas-on-Spark would likely do something like this: - >>> ps.Series(['a', 'b']).dt ... Traceback (most recent call last): ... ValueError: Cannot call DatetimeMethods on type StringType() - Examples - In your library code: - from pyspark.pandas.extensions import register_series_accessor @register_series_accessor("geo") class GeoAccessor: def __init__(self, pandas_on_spark_obj): self._obj = pandas_on_spark_obj @property def is_valid(self): # boolean check to see if series contains valid geometry return True - Then, in an ipython session: - >>> ## Import if the accessor is in the other file. >>> # from my_ext_lib import GeoAccessor >>> psdf = ps.DataFrame({"longitude": np.linspace(0,10), ... "latitude": np.linspace(0, 20)}) >>> psdf.longitude.geo.is_valid True