a ‘º”h¡6ã@sÆUddlZddlZddlmZmZddlmZddlmZm Z m Z mZmZddl Z ddlmZddl mZddlmZddlmZgZeeed<e e¡ZGd d „d ejƒZeeedœdd „ZdS)éN)Ú CollectionÚMapping)Údeepcopy)ÚAnyÚCallableÚOptionalÚoverloadÚUnion)Úoptim)Ú ShardedTensor)ÚFullyShardedDataParallelÚ__all__c @sxeZdZdZdeeeeje ffe jee eeefeejeedfeeefddœdd„Zddœdd „Zeeefdœd d„Zed dddœd d„ƒZeegefedœdd„ƒZd!eegefeedœdd„Zeeejefdœdd„ƒZeeefddœdd„Zeeefddœdd„Zddœdd„Zeeefeeefdœdd„Zeeefeeefdœdd„Z dS)"Ú_NamedOptimizeraì ``_NamedOptimizer`` takes a dict of parameters and exposes ``state_dict`` by parameter key. We replace the original key (number) in an optim to the fully qualified name (FQN) string. User can initialize the optim as they initialize a PyTorch optim, the only difference is that they also need to pass in the FQN of each parameters. Args: named_parameters (Mapping[str, Union[torch.Tensor, ShardedTensor]]): Mapping from FQN to parameter. optimizer_class (optim.Optimizer): The class of optimizer to instantiate. param_groups (Collection[Mapping[str, Any]]): `param_groups` to pass to optimizer if specified. The key of the inner map needs to be FQNs. Default: None module (nn.Module): the module whose parameters to updated by the optimizer. args: arguments to pass to the optimizer constructor. kwargs: arguments to pass to the optimizer constructor. Example:: >>> # xdoctest: +SKIP("distributed") >>> from torch import optim >>> from torch.distributed.optim import _NamedOptimizer >>> >>> # Define the named optimizer. >>> m = Model(...) >>> named_optim = _NamedOptimizer(m.named_parameters(), optim.SGD) >>> # Forward pass + backward pass. >>> named_optim.step() >>> ... >>> # Call state_dict for the named optimizer returns a FQN state_dict. >>> named_optim.state_dict() Warning: This API is still in development and subject to change. TODO: Add tutorial for _NamedOptimizer. TODO: Add documentation in the docstring for the public attributes like self.param_groups and self.named_parameters. N.)Únamed_parametersÚoptimizer_classÚparam_groupsÚmoduleÚargsÚkwargsÚreturncOsætj d¡||_| ¡t|ƒ|_|dur6|j ¡n|}||g|¢Ri|¤Ž|_||_ |durrt |j ¡ƒ|_nft d¡dd„|j ¡Dƒ}g} |D]8} | dD]*}||vrÀtd|›dƒ‚| ||¡q¤q˜| |_|jj|_dS)Nz'torch.distributed.optim._NamedOptimizerzvSince we pass in param_groups, we will use param_groups to initialize the optimizer, not all parameters of the module.cSsi|]\}}||“qS©r©Ú.0ÚkeyÚparamrrúU/var/www/auris/lib/python3.9/site-packages/torch/distributed/optim/named_optimizer.pyÚ \óz,_NamedOptimizer.__init__..ÚparamszExpect param name z% found in param group but is missing.)ÚtorchZ_CZ_log_api_usage_oncerÚ_param_groups_checkÚdictrÚvaluesÚ _optimizerrÚlistÚkeysÚordered_param_keysÚwarningsÚwarnÚitemsÚ ValueErrorÚappend)ÚselfrrrrrrZparams_for_optimizerÚparam_to_keyr&ÚgrouprrrrÚ__init__?s< ÿÿþýÿ ÿz_NamedOptimizer.__init__)rcCsŽ|jdurŠ|jD]x}t|tƒs&Jdƒ‚d|vs6Jdƒ‚|d}t|tjƒrP|g}t|ƒ}|D]"}t|tjƒs\tdt |¡ƒ‚q\||d<qdS)Núparam group must be a dictrz#param group must contain key paramsz>optimizer can only optimize Tensors, but one of the params is )rÚ isinstancer!rÚTensorr$Ú TypeErrorÚtypename)r,Úparam_grouprrrrrr is ÿÿz#_NamedOptimizer._param_groups_checkc sœˆj ¡}|d}‡fdd„|d ¡Dƒ}g}|D]V}‡fdd„|dDƒ}dt|ƒi}| ¡D]\}} |dkrbt| ƒ||<qb| |¡q4ˆ ||dœ¡S) zµ Return the ``state_dict`` of the optimizer. Instead of using number to index parameters, we will use module fully qualified name (FQN) as the key. rcsi|]\}}ˆj||“qSr©r&)rZst_keyÚ state_val©r,rrr„sÿz._NamedOptimizer.state_dict..Ústatecsg|]}ˆj|‘qSrr6)rrr8rrÚ ‹rz._NamedOptimizer.state_dict..r)r9r)r#Ú state_dictr)Úsortedrr+Ú_post_state_dict) r,r;rZ ret_stateZ ret_groupsr.Ú param_keysZ ret_groupÚkÚvrr8rr;zs þz_NamedOptimizer.state_dict)ÚclosurercCsdS©Nr©r,rArrrÚstep”sz_NamedOptimizer.stepcCsdSrBrrCrrrrD—scCs|jj|dS)z’ Perform a single optimization step. This will call :meth:`torch.optim.Optimizer.step` on the wrapped optimizer. ©rA)r#rDrCrrrrDšscCs|jjSrB)r#r9r8rrrr9£sz_NamedOptimizer.state)r;rcCsà|j ¡}| |¡}|d}|d}t|ƒdkr8tdƒ‚t|jƒD]l\}}|| ¡vrZqBt||ƒt||ƒkrœtdt||ƒ›d|›dt||ƒ›ƒ‚|| ¡D]\}}|||vrÔtd|›d|›dƒ‚|||} t |t ƒrnt | t ƒsúJ‚t| ¡ƒ} t| ¡ƒ}| |krZ new_group_mapZ new_groupZ group_keyZ src_groupr?rrrrN§s‚ ÿ$ÿÿ ÿÿ ÿ ÿ z_NamedOptimizer.load_state_dict)r5rcCsšt|tƒsJdƒ‚|d}t|tjƒr2|g|d<nt|ƒ|d<dd„|j ¡Dƒ}|dD]$}||vrntdƒ‚|j ||¡qZ|j |¡|j j|_dS)zŸ Add a param group to the :class:`_NamedOptimizer` s `param_groups`. Warning: This API is still in development and subject to change. r0rcSsi|]\}}||“qSrrrrrrrrz3_NamedOptimizer.add_param_group..z%some parameters are not in the moduleN) r1r!rr2r$rr)r*r&r+r#Úadd_param_groupr)r,r5rr-rrrrrPsz_NamedOptimizer.add_param_groupcCs>|j ¡D]"}|jr t |¡}tj |¡|_q |jdddS)z× Run a dummy optimizer step, which allows to initialize optimizer state because we do lazy init for most optimizers. This allows doing in-place loading of optimizer state from a checkpoint. NrE) rr"Z requires_gradrZ zeros_likeZautogradÚVariableZgradrD)r,rÚtrrrÚ init_state's z_NamedOptimizer.init_statecCs&t|jtƒr"tj|j|j|ddS|S)NT)Zis_named_optimizer)r1rÚFSDPZoptim_state_dict_to_loadr#©r,r;rrrrH4s ÿz$_NamedOptimizer._pre_load_state_dictcCs"t|jtƒrt |j|j|¡|SrB)r1rrTZoptim_state_dictr#rUrrrr==sz _NamedOptimizer._post_state_dict)NN)N)N)!Ú__name__Ú __module__Ú__qualname__Ú__doc__rÚstrr rr2rr Ú OptimizerrrrÚnnÚModuleÚtupler!r/r r;rrDrÚfloatÚpropertyr9rNrPrSrHr=rrrrrs4/û ø*" h r)r>rcCsd t|ƒ¡S)zFConcatenate all param keys as a unique identifier for one param group.rF)Újoinr<)r>rrrrMEsrM) Úloggingr'Úcollections.abcrrÚcopyrÚtypingrrrrr rZtorch.nnr\r Z'torch.distributed._shard.sharded_tensorrZtorch.distributed.fsdprrTr r$rZÚ__annotations__Ú getLoggerrVÚloggerr[rrMrrrrÚs 4