o ‡ZŽhv6ã@sÈUddlZddlZddlmZmZddlmZddlmZm Z m Z mZmZddl Z ddlmZddl mZddlmZddlmZgZeeed<e e¡ZGd d „d ejƒZdeedefd d„ZdS)éN)Ú CollectionÚMapping)Údeepcopy)ÚAnyÚCallableÚOptionalÚoverloadÚUnion)Úoptim)Ú ShardedTensor)ÚFullyShardedDataParallelÚ__all__c@sTeZdZdZ d"deeeeje ffde jdee eeefdeejddf dd „Zd d„Zdeeeffdd „Zed#d$dd„ƒZedegefdefdd„ƒZd%deegefdeefdd„Zedeejeffdd„ƒZdeeefddfdd„Zdeeefddfdd„Zd&dd„Zdeeeffdd„Zdeeeffd d!„ZdS)'Ú_NamedOptimizeraì ``_NamedOptimizer`` takes a dict of parameters and exposes ``state_dict`` by parameter key. We replace the original key (number) in an optim to the fully qualified name (FQN) string. User can initialize the optim as they initialize a PyTorch optim, the only difference is that they also need to pass in the FQN of each parameters. Args: named_parameters (Mapping[str, Union[torch.Tensor, ShardedTensor]]): Mapping from FQN to parameter. optimizer_class (optim.Optimizer): The class of optimizer to instantiate. param_groups (Collection[Mapping[str, Any]]): `param_groups` to pass to optimizer if specified. The key of the inner map needs to be FQNs. Default: None module (nn.Module): the module whose parameters to updated by the optimizer. args: arguments to pass to the optimizer constructor. kwargs: arguments to pass to the optimizer constructor. Example:: >>> # xdoctest: +SKIP("distributed") >>> from torch import optim >>> from torch.distributed.optim import _NamedOptimizer >>> >>> # Define the named optimizer. >>> m = Model(...) >>> named_optim = _NamedOptimizer(m.named_parameters(), optim.SGD) >>> # Forward pass + backward pass. >>> named_optim.step() >>> ... >>> # Call state_dict for the named optimizer returns a FQN state_dict. >>> named_optim.state_dict() Warning: This API is still in development and subject to change. TODO: Add tutorial for _NamedOptimizer. TODO: Add documentation in the docstring for the public attributes like self.param_groups and self.named_parameters. NÚnamed_parametersÚoptimizer_classÚparam_groupsÚmoduleÚreturncOsætj d¡||_| ¡t|ƒ|_|dur|j ¡n|}||g|¢Ri|¤Ž|_||_ |dur9t |j ¡ƒ|_n3t d¡dd„|j ¡Dƒ}g} |D]} | dD]}||vr`td|›dƒ‚| ||¡qRqL| |_|jj|_dS)Nz'torch.distributed.optim._NamedOptimizerzvSince we pass in param_groups, we will use param_groups to initialize the optimizer, not all parameters of the module.cSói|]\}}||“qS©r©Ú.0ÚkeyÚparamrrúV/var/www/auris/lib/python3.10/site-packages/torch/distributed/optim/named_optimizer.pyÚ ]óz,_NamedOptimizer.__init__..ÚparamszExpect param name z% found in param group but is missing.)ÚtorchZ_CZ_log_api_usage_oncerÚ_param_groups_checkÚdictrÚvaluesÚ _optimizerrÚlistÚkeysÚordered_param_keysÚwarningsÚwarnÚitemsÚ ValueErrorÚappend)ÚselfrrrrÚargsÚkwargsZparams_for_optimizerÚparam_to_keyr%ÚgrouprrrrÚ__init__@s> ÿÿþýÿ ÿûz_NamedOptimizer.__init__cCs’|jdurE|jD]>}t|tƒsJdƒ‚d|vsJdƒ‚|d}t|tjƒr(|g}t|ƒ}|D]}t|tjƒs?tdt |¡ƒ‚q.||d<qdSdS)Núparam group must be a dictrz#param group must contain key paramsz>optimizer can only optimize Tensors, but one of the params is )rÚ isinstancer rÚTensorr#Ú TypeErrorÚtypename)r+Úparam_grouprrrrrrjs& ÿÿÿ òz#_NamedOptimizer._param_groups_checkc sœˆj ¡}|d}‡fdd„|d ¡Dƒ}g}|D]+}‡fdd„|dDƒ}dt|ƒi}| ¡D]\}} |dkr?t| ƒ||<q1| |¡qˆ ||dœ¡S) zµ Return the ``state_dict`` of the optimizer. Instead of using number to index parameters, we will use module fully qualified name (FQN) as the key. rcsi|] \}}ˆj||“qSr©r%)rZst_keyÚ state_val©r+rrr…s ÿÿz._NamedOptimizer.state_dict..Ústatecsg|]}ˆj|‘qSrr7)rrr9rrÚ Œrz._NamedOptimizer.state_dict..r)r:r)r"Ú state_dictr(Úsortedrr*Ú_post_state_dict) r+r<rZ ret_stateZ ret_groupsr/Ú param_keysZ ret_groupÚkÚvrr9rr<{s þ€z_NamedOptimizer.state_dict.ÚclosurecCódS©Nr©r+rBrrrÚstep•óz_NamedOptimizer.stepcCrCrDrrErrrrF˜rGcCs|jj|dS)z’ Perform a single optimization step. This will call :meth:`torch.optim.Optimizer.step` on the wrapped optimizer. ©rB)r"rFrErrrrF›scCs|jjSrD)r"r:r9rrrr:¤sz_NamedOptimizer.stater<cCsÌ|j ¡}| |¡}|d}|d}t|ƒdkrtdƒ‚t|jƒD]°\}}|| ¡vr,q!t||ƒt||ƒkrMtdt||ƒ›d|›dt||ƒ›ƒ‚|| ¡D]}\}}|||vrhtd|›d|›dƒ‚|||} t |t ƒr²t | t ƒszJ‚t| ¡ƒ} t| ¡ƒ}| |krštd |›d | ›d|›d|›ƒ‚t| ¡| ¡ƒD] \}} |j ¡ | j ¡q£qSt |tjƒrÈt | tjƒsÀJ‚| ¡ | ¡qSt| ƒ|||<qSq!|d }|d }i}|D]}t|dƒ}||t|ƒ<qÞi}|D]}g}|dD] }| |j|¡qù||t|ƒ<qñ| ¡D]N\}}||vrq||}t|ƒt|ƒkr9tdt|ƒ›d|›d t|ƒ›dƒ‚|D] }||vrMtd|›d|›dƒ‚|dkrZt||ƒ||<q;q|j |¡dS)aè Define the default behavior to load a state_dict for ``_NamedOptimizer``. Sample Code ``` my_model = MyModule() optimizer = _NamedOptimizer(my_model.named_parameters(), Adagrad) ... optim_state_dict = optimizer.state_dict() ... ... optimizer.load_state_dict(optim_state_dict) ... ``` Args: state_dict (Dict[str, Any]) : A ``state_dict`` to load into the optimizer. Note that this state dict update is performed in place. .. note:: PyTorch is using lazy init to initialize the optim states. So it is possible that there is no optim state when user call ``load_state_dict`` and for ``_NamedOptimizer`` we make it stricter that users can only call ``load_state_dict`` after the state is initialized. By doing this, we can validate the optim ``state_dict`` to be loaded. r:rzJExpects the optim to be initialized before load but found not initialized.zExpects equal length as z for parameter z but found: zExpects state z but not found.z"Expects equal number of shards as z but found z for ú/rrz"Expects equal param_group size as z for group Ú.zExpects group key z to be in group z in `state_dict` but is missing.N)r"r<Ú_pre_load_state_dictÚlenr)Ú enumerater%r$r(r2rZlocal_shardsÚzipZtensorÚdetachZcopy_rr3rr#Ú_gen_param_group_keyr*Úload_state_dict)r+r<Znew_state_dictr:Ú new_stateÚidxZ param_keyZ state_keyr8Z src_state_valZ num_shardsZnum_new_shardsZshardZ src_shardZsrc_param_groupsZnew_param_groupsZ src_group_mapr/r?Z new_group_mapZ new_groupZ group_keyZ src_groupr@rrrrQ¨sŠ ÿ$ÿÿ ÿÿýé ÿ ÿ €úz_NamedOptimizer.load_state_dictr6cCsšt|tƒs Jdƒ‚|d}t|tjƒr|g|d<nt|ƒ|d<dd„|j ¡Dƒ}|dD]}||vr7tdƒ‚|j ||¡q-|j |¡|j j|_dS)zŸ Add a param group to the :class:`_NamedOptimizer` s `param_groups`. Warning: This API is still in development and subject to change. r1rcSrrrrrrrrrz3_NamedOptimizer.add_param_group..z%some parameters are not in the moduleN) r2r rr3r#rr(r)r%r*r"Úadd_param_groupr)r+r6rr.rrrrrTsz_NamedOptimizer.add_param_groupcCs>|j ¡D]}|jrt |¡}tj |¡|_q|jdddS)z× Run a dummy optimizer step, which allows to initialize optimizer state because we do lazy init for most optimizers. This allows doing in-place loading of optimizer state from a checkpoint. NrH) rr!Z requires_gradrZ zeros_likeZautogradÚVariableZgradrF)r+rÚtrrrÚ init_state(s €z_NamedOptimizer.init_statecCs&t|jtƒrtj|j|j|ddS|S)NT)Zis_named_optimizer)r2rÚFSDPZoptim_state_dict_to_loadr"©r+r<rrrrK5s ÿz$_NamedOptimizer._pre_load_state_dictcCs"t|jtƒrt |j|j|¡|SrD)r2rrXZoptim_state_dictr"rYrrrr>>sz _NamedOptimizer._post_state_dict)NN).)rBNrNrD)rN) Ú__name__Ú __module__Ú__qualname__Ú__doc__rÚstrr rr3rr Ú OptimizerrrrÚnnÚModuler0rr r<rrFrÚfloatÚpropertyr:rQrTrWrKr>rrrrrs:/ûþýüû ø*$ h rr?rcCsd t|ƒ¡S)zGConcatenate all param keys as a unique indentifier for one param group.rI)Újoinr=)r?rrrrPFsrP) Úloggingr&Úcollections.abcrrÚcopyrÚtypingrrrrr rZtorch.nnr`r Z'torch.distributed._shard.sharded_tensorrZtorch.distributed.fsdprrXr r#r^Ú__annotations__Ú getLoggerrZÚloggerr_rrPrrrrÚs 4