加勒比久久综合,国产精品伦一区二区,66精品视频在线观看,一区二区电影

合肥生活安徽新聞合肥交通合肥房產生活服務合肥教育合肥招聘合肥旅游文化藝術合肥美食合肥地圖合肥社保合肥醫院企業服務合肥法律

ECE 498代寫、代做Python設計編程
ECE 498代寫、代做Python設計編程

時間:2024-11-15  來源:合肥網hfw.cc  作者:hfw.cc 我要糾錯



ECE 498/598 Fall 2024, Homeworks 3 and 4
Remarks:
1. HW3&4: You can reduce the context length to ** if you are having trouble with the
training time.
2. HW3&4: During test evaluation, note that positional encodings for unseen/long
context are not trained. You are supposed to evaluate it as is. It is OK if it doesn’t
work well.
3. HW3&4: Comments are an important component of the HW grade. You are expected
to explain the experimental findings. If you don’t provide technically meaningful
comments, you might receive a lower score even if your code and experiments are
accurate.
4. The deadline for HW3 is November 11th at 11:59 PM, and the deadline for HW4 is
November 18th at 11:59 PM. For each assignment, please submit both your code and a
PDF report that includes your results (figures) for each question. You can generate the
PDF report from a Jupyter Notebook (.ipynb file) by adding comments in markdown
cells.
1
The objective of this assignment is comparing transformer architecture and SSM-type
architectures (specifically Mamba [1]) on the associative recall problem. We provided an
example code recall.ipynb which provides an example implementation using 2 layer
transformer. You will adapt this code to incorporate different positional encodings, use
Mamba layers, or modify dataset generation.
Background: As you recall from the class, associative recall (AR) assesses two abilities
of the model: Ability to locate relevant information and retrieve the context around that
information. AR task can be understood via the following question: Given input prompt
X = [a 1 b 2 c 3 b], we wish the model to locate where the last token b occurs earlier
and output the associated value Y = 2. This is crucial for memory-related tasks or bigram
retrieval (e.g. ‘Baggins’ should follow ‘Bilbo’).
To proceed, let us formally define the associative recall task we will study in the HW.
Definition 1 (Associative Recall Problem) Let Q be the set of target queries with cardinal ity |Q| = k. Consider a discrete input sequence X of the form X = [. . . q v . . . q] where the
query q appears exactly twice in the sequence and the value v follows the first appearance
of q. We say the model f solves AR(k) if f(X) = v for all sequences X with q ∈ Q.
Induction head is a special case of the definition above where the query q is fixed (i.e. Q
is singleton). Induction head is visualized in Figure 1. On the other extreme, we can ask the
model to solve AR for all queries in the vocabulary.
Problem Setting
Vocabulary: Let [K] = {1, . . . , K} be the token vocabulary. Obtain the embedding of
the vocabulary by randomly generating a K × d matrix V with IID N(0, 1) entries, then
normalized its rows to unit length. Here d is the embedding dimension. The embedding of
the i-th token is V[i]. Use numpy.random.seed(0) to ensure reproducibility.
Experimental variables: Finally, for the AR task, Q will simply be the first M elements
of the vocabulary. During experiments, K, d, M are under our control. Besides this we will
also play with two other variables:
• Context length: We will train these models up to context length L. However, we
will evaluate with up to 3L. This is to test the generalization of the model to unseen
lengths.
• Delay: In the basic AR problem, the value v immediately follows q. Instead, we will
introduce a delay variable where v will appear τ tokens after q. τ = 1 is the standard.
Models: The motivation behind this HW is reproducing the results in the Mamba paper.
However, we will also go beyond their evaluations and identify weaknesses of both trans former and Mamba architectures. Specifically, we will consider the following models in our
evaluations:
2
Figure 1: We will work on the associative recall (AR) problem. AR problem requires the
model to retrieve the value associated with all queries whereas the induction head requires
the same for a specific query. Thus, the latter is an easier problem. The figure above is
directly taken from the Mamba paper [1]. The yellow-shaded regions highlight the focus of
this homework.
• Transformer: We will use the transformer architecture with 2 attention layers (no
MLP). We will try the following positional encodings: (i) learned PE (provided code),
(ii) Rotary PE (RoPE), (iii) NoPE (no positional encoding)
• Mamba: We will use the Mamba architecture with 2 layers.
• Hybrid Model: We will use an initial Mamba layer followed by an attention layer.
No positional encoding is used.
Hybrid architectures are inspired by the Mamba paper as well as [2] which observes the
benefit of starting the model with a Mamba layer. You should use public GitHub repos to
find implementations (e.g. RoPE encoding or Mamba layer). As a suggestion, you can use
this GitHub Repo for the Mamba model.
Generating training dataset: During training, you train with minibatch SGD (e.g. with
batch size 64) until satisfactory convergence. You can generate the training sequences for
AR as follows given (K, d, M, L, τ):
1. Training sequence length is equal to L.
2. Sample a query q ∈ Q and a value v ∈ [K] uniformly at random, independently. Recall
that size of Q is |Q| = M.
3. Place q at the end of the sequence and place another q at an index i chosen uniformly
at random from 1 to L − τ.
4. Place value token at the index i + τ.
3
5. Sample other tokens IID from [K]−q i.e. other tokens are drawn uniformly at random
but are not equal to q.
6. Set label token Y = v.
Test evaluation: Test dataset is same as above. However, we will evaluate on all sequence
lengths from τ + 1 to 3L. Note that τ + 2 is the shortest possible sequence.
Empirical Evidence from Mamba Paper: Table 2 of [1] demonstrates that Mamba can do
a good job on the induction head problem i.e. AR with single query. Additionally, Mamba
is the only model that exhibits length generalization, that is, even if you train it pu to context
length L, it can still solve AR for context length beyond L. On the other hand, since Mamba
is inherently a recurrent model, it may not solve the AR problem in its full generality. This
motivates the question: What are the tradeoffs between Mamba and transformer, and can
hybrid models help improve performance over both?
Your assignments are as follows. For each problem, make sure to return the associated
code. These codes can be separate cells (clearly commented) on a single Jupyter/Python file.
Grading structure:
• Problem 1 will count as your HW3 grade. This only involves Induction Head
experiments (i.e. M = 1).
• Problems 2 and 3 will count as your HW4 grade.
• You will make a single submission.
Problem 1 (50=25+15+10pts). Set K = 16, d = 8, L = ** or L = 64.
• Train all models on the induction heads problem (M = 1, τ = 1). After training,
evaluate the test performance and plot the accuracy of all models as a function of
the context length (similar to Table 2 of [1]). In total, you will be plotting 5 curves
(3 Transformers, 1 Mamba, 1 Hybrid). Comment on the findings and compare the
performance of the models including length generalization ability.
• Repeat the experiment above with delay τ = 5. Comment on the impact of delay.
• Which models converge faster during training? Provide a plot of the convergence rate
where the x-axis is the number of iterations and the y-axis is the AR accuracy over a
test batch. Make sure to specify the batch size you are using (ideally use ** or 64).
Problem 2 (30pts). Set K = 16, d = 8, L = ** or L = 64. We will train Mamba, Transformer
with RoPE, and Hybrid. Set τ = 1 (standard AR).
• Train Mamba models for M = 4, 8, 16. Note that M = 16 is the full AR (retrieve any
query). Comment on the results.
• Train Transformer models for M = 4, 8, 16. Comment on the results and compare
them against Mamba’s behavior.
4
• Train the Hybrid model for M = 4, 8, 16. Comment and compare.
Problem 3 (20=15+5pts). Set K = 16, d = 64, L = ** or L = 64. We will only train
Mamba models.
• Set τ = 1 (standard AR). Train Mamba models for M = 4, 8, 16. Compare against the
corresponding results of Problem 2. How does embedding d impact results?
• Train a Mamba model for M = 16 for τ = 10. Comment if any difference.




請加QQ:99515681  郵箱:99515681@qq.com   WX:codinghelp






 

掃一掃在手機打開當前頁
  • 上一篇:IEMS5731代做、代寫java設計編程
  • 下一篇:ENGG1110代做、R編程語言代寫
  • 無相關信息
    合肥生活資訊

    合肥圖文信息
    2025年10月份更新拼多多改銷助手小象助手多多出評軟件
    2025年10月份更新拼多多改銷助手小象助手多
    有限元分析 CAE仿真分析服務-企業/產品研發/客戶要求/設計優化
    有限元分析 CAE仿真分析服務-企業/產品研發
    急尋熱仿真分析?代做熱仿真服務+熱設計優化
    急尋熱仿真分析?代做熱仿真服務+熱設計優化
    出評 開團工具
    出評 開團工具
    挖掘機濾芯提升發動機性能
    挖掘機濾芯提升發動機性能
    海信羅馬假日洗衣機亮相AWE  復古美學與現代科技完美結合
    海信羅馬假日洗衣機亮相AWE 復古美學與現代
    合肥機場巴士4號線
    合肥機場巴士4號線
    合肥機場巴士3號線
    合肥機場巴士3號線
  • 短信驗證碼 目錄網 排行網

    關于我們 | 打賞支持 | 廣告服務 | 聯系我們 | 網站地圖 | 免責聲明 | 幫助中心 | 友情鏈接 |

    Copyright © 2025 hfw.cc Inc. All Rights Reserved. 合肥網 版權所有
    ICP備06013414號-3 公安備 42010502001045

    国自产拍偷拍福利精品免费一| 亚洲电影影音先锋| 国产精品亚洲综合色区韩国| 亚洲尤物在线| 日韩精品免费一区二区在线观看| 在线观看视频免费一区二区三区| 欧美国产一区二区三区激情无套| 中文字幕视频精品一区二区三区| 日韩高清一级片| 成人美女视频| 夜夜嗨一区二区三区| 精品国产一区二区三区噜噜噜| 亚洲男女网站| 国产日韩欧美在线播放不卡| 97久久视频| 伊人久久大香线| 成人综合久久| 精品久久免费| 国产精品日韩精品在线播放| 国产精品一二| 日韩在线短视频| 麻豆久久婷婷| 男女男精品视频网| 九九综合九九| 亚洲三级av| 欧美特黄不卡| 久久av超碰| 国产精品第一| 国产精品久久久乱弄| 亚洲大全视频| 欧美日中文字幕| 精品大片一区二区| 一区二区在线视频观看| 国产一区二区亚洲| 国产精品视频一区视频二区 | 99视频精品| 婷婷国产精品| 红杏一区二区三区| 蜜桃精品一区二区三区| 亚洲人亚洲人色久| 欧美男gay| 国产精品免费大片| 欧美专区一区| 亚洲97av| 日韩有码一区| 久久伊人影院| 成人免费在线电影网| 日韩精品一级| 91精品国产自产在线丝袜啪| 欧美a在线观看| 视频精品国内| 国产成人精品亚洲线观看| www.神马久久| 国产精品对白久久久久粗| 红杏aⅴ成人免费视频| 欧美精品中文| 久久网站免费观看| 黄色国产精品| 99亚洲视频| 亚洲免费精品| 亚洲涩涩av| 另类中文字幕网| 日韩高清不卡一区二区| 羞羞视频在线观看欧美| 久久精品日韩欧美| 黄色av成人| 亚洲美女久久| 久久综合五月婷婷| 日韩精品电影在线| 久久福利综合| 久久综合综合久久综合| 麻豆国产一区二区| 99精品视频在线免费播放| 国精一区二区| 成人午夜网址| 欧美成人中文| 免费人成在线不卡| 日韩免费在线| 美女www一区二区| 成人午夜888| 91蜜桃臀久久一区二区| 国产一区二区三区自拍| 亚洲视频播放| 中文在线аv在线| 精品久久福利| 国产精品一区二区三区av麻| www.丝袜精品| 欧美精品一线| 神马久久资源| 国产精品www994| 日韩av一级片| jiujiure精品视频播放| 福利在线免费视频| 欧美一级二区| 九九99久久精品在免费线bt| 啪啪国产精品| 水蜜桃久久夜色精品一区的特点| 午夜不卡影院| 国产亚洲高清在线观看| 视频在线一区| 99pao成人国产永久免费视频| 日韩一区二区三区免费播放| 麻豆国产精品视频| 国偷自产av一区二区三区| 在线国产一区二区| 78精品国产综合久久香蕉| 国产精品亚洲片在线播放| 99久久久久| 亚洲www免费| 亚洲97av| 91久久亚洲| 美女在线视频一区| 都市激情亚洲欧美| 色天天综合网| 你懂的网址国产 欧美| 99久久激情| 色综合一本到久久亚洲91| 欧美午夜在线播放| 一区二区视频欧美| av免费在线一区| 久久伊人久久| 丝袜a∨在线一区二区三区不卡| 日日欢夜夜爽一区| 国产精品22p| 波多视频一区| 美女毛片一区二区三区四区最新中文字幕亚洲| 99视频精品视频高清免费| 中国色在线日|韩| 国产乱码精品一区二区三区四区| 欧美久久精品一级c片| 高清在线一区| 日韩精品欧美激情一区二区| 欧美男人天堂| 激情不卡一区二区三区视频在线| 噜噜噜91成人网| 999精品嫩草久久久久久99| 激情综合网站| 亚洲久久在线| 波多野结衣的一区二区三区| 激情中国色综合| 99久久激情| 国产欧美日韩综合一区在线播放| 欧美视频久久| 少妇精品视频一区二区免费看| 亚洲天堂av资源在线观看| 麻豆国产在线| 99久久婷婷国产综合精品青牛牛| 韩日毛片在线观看| 亚洲性视频在线| 怡红院成人在线| 99精品视频在线观看播放| 欧美一区二区三区免费看 | 日本精品在线播放| h片在线观看视频免费| 国产欧美视频在线| 欧美亚洲黄色| 亚洲性色视频| 麻豆国产精品官网| 国产精品女主播一区二区三区| 亚洲图片小说区| 免费成人美女在线观看.| 午夜电影一区| 国产成人精品一区二区三区在线| 欧美色就是色| 亚洲一级淫片| 亚洲欧洲高清| 久久亚洲精品中文字幕蜜潮电影| 99精品欧美| 欧美一区=区| 成人自拍在线| 日韩国产欧美在线播放| 国产精品日韩| 精品视频在线播放一区二区三区 | 日韩不卡一二三区| 日韩中文字幕高清在线观看| 色老板在线视频一区二区| 国产精品地址| 97国产成人高清在线观看| 久久久噜噜噜| 国产一区二区在线观| 日韩欧美二区| 欧美日韩国产高清| 精品久久亚洲| 国产欧美丝祙| 国产精品精品| 狠狠操综合网| 麻豆一区在线| 亚洲国产一区二区三区高清| 蜜臀av性久久久久av蜜臀妖精 | 日韩免费大片| 国产精品毛片| 国偷自产av一区二区三区| 久久丝袜视频| 欧美天堂在线| 老司机精品视频网站| 黄色成人美女网站| 久久av中文| 日韩国产精品大片| 欧美香蕉视频| 首页国产欧美日韩丝袜|